This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
#packages
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
pitches <- read.csv('C:\\Users\\Nick\\UCSB Baseball\\All_College_TM_19_22.csv')
Add a column that specifies how many pitches a pitcher has thrown
pitches$pitch_count <- with(pitches, ave(seq_along(paste(GameID, PitcherId)), paste(GameID, PitcherId), FUN = seq_along)) - 1
# Add a new factor column to the dataframe for the pitch group
pitches$pitch_group <- as.factor(ifelse(pitches$pitch_count < 100, (pitches$pitch_count) %/% 10 + 1, 11))
# Check the updated dataframe
head(pitches, 250)
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 2022-02-18 | 13:32:19.86 | 1 | 1 | Kniskern, Trevor | 1000054486 | |
| 2 | 2 | 2 | 2022-02-18 | 13:32:36.00 | 1 | 2 | Kniskern, Trevor | 1000054486 | |
| 3 | 3 | 3 | 2022-02-18 | 13:33:12.45 | 1 | 3 | Kniskern, Trevor | 1000054486 | |
| 4 | 4 | 4 | 2022-02-18 | 13:33:53.17 | 2 | 1 | Kniskern, Trevor | 1000054486 | |
| 5 | 5 | 5 | 2022-02-18 | 13:34:10.28 | 2 | 2 | Kniskern, Trevor | 1000054486 | |
| 6 | 6 | 6 | 2022-02-18 | 13:34:29.80 | 2 | 3 | Kniskern, Trevor | 1000054486 | |
| 7 | 7 | 7 | 2022-02-18 | 13:34:50.36 | 2 | 4 | Kniskern, Trevor | 1000054486 | |
| 8 | 8 | 8 | 2022-02-18 | 13:35:24.42 | 2 | 5 | Kniskern, Trevor | 1000054486 | |
| 9 | 9 | 9 | 2022-02-18 | 13:36:11.69 | 3 | 1 | Kniskern, Trevor | 1000054486 | |
| 10 | 10 | 10 | 2022-02-18 | 13:36:36.90 | 3 | 2 | Kniskern, Trevor | 1000054486 |
I want better names for the pitch_group levels
pitches$pitch_bin <- pitches$pitch_group
pitches$pitch_group <- NA
pitches$pitch_group[pitches$pitch_bin == '1'] <- '0-9 Pitches'
pitches$pitch_group[pitches$pitch_bin == '2'] <- '10-19 Pitches'
pitches$pitch_group[pitches$pitch_bin == '3'] <- '20-29 Pitches'
pitches$pitch_group[pitches$pitch_bin == '4'] <- '30-39 Pitches'
pitches$pitch_group[pitches$pitch_bin == '5'] <- '40-49 Pitches'
pitches$pitch_group[pitches$pitch_bin == '6'] <- '50-59 Pitches'
pitches$pitch_group[pitches$pitch_bin == '7'] <- '60-69 Pitches'
pitches$pitch_group[pitches$pitch_bin == '8'] <- '70-79 Pitches'
pitches$pitch_group[pitches$pitch_bin == '9'] <- '80-89 Pitches'
pitches$pitch_group[pitches$pitch_bin == '10'] <- '90-99 Pitches'
pitches$pitch_group[pitches$pitch_bin == '11'] <- '> 100 Pitches'
#MAke sure the order is correct. Really annoying if regression output isn't in ascending order
sqldf("SELECT pitch_group, count(*) from pitches GROUP BY pitch_group ORDER BY pitch_group")
pitch_group <chr> | count(*) <int> | |||
|---|---|---|---|---|
| 0-9 Pitches | 297797 | |||
| 10-19 Pitches | 240134 | |||
| 20-29 Pitches | 170913 | |||
| 30-39 Pitches | 121986 | |||
| 40-49 Pitches | 90307 | |||
| 50-59 Pitches | 69952 | |||
| 60-69 Pitches | 55977 | |||
| 70-79 Pitches | 42704 | |||
| 80-89 Pitches | 29769 | |||
| 90-99 Pitches | 16710 |
Check for mistakes. Sometimes the trackman doesn’t change pitcherId when a new pitcher comes in.
library(sqldf)
library(dplyr)
sqldf("SELECT pitch_count, count(*) from pitches GROUP BY pitch_count")
pitch_count <dbl> | count(*) <int> | |||
|---|---|---|---|---|
| 0 | 30862 | |||
| 1 | 30771 | |||
| 2 | 30656 | |||
| 3 | 30451 | |||
| 4 | 30168 | |||
| 5 | 29814 | |||
| 6 | 29431 | |||
| 7 | 29042 | |||
| 8 | 28564 | |||
| 9 | 28038 |
pitches[pitches$pitch_count > 120, ] %>% arrange(GameID)
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | |
|---|---|---|---|---|---|---|---|---|
| 869644 | 248 | 2019-03-29 | 9:42:46 PM | 3 | 4 | Lunn, Connor | 8899825 | |
| 869645 | 249 | 2019-03-29 | 9:43:26 PM | 4 | 1 | Lunn, Connor | 8899825 | |
| 869646 | 250 | 2019-03-29 | 9:43:56 PM | 4 | 2 | Lunn, Connor | 8899825 | |
| 869647 | 251 | 2019-03-29 | 9:44:36 PM | 4 | 3 | Lunn, Connor | 8899825 | |
| 869275 | 212 | 2019-03-29 | 8:57:59 PM | 7 | 5 | Bibee, Tanner | 1000018470 | |
| 880040 | 249 | 2019-03-30 | 6:37:14 PM | 5 | 5 | Koski, Jon | 1000030130 | |
| 880041 | 250 | 2019-03-30 | 6:37:35 PM | 5 | 6 | Koski, Jon | 1000030130 | |
| 880072 | 281 | 2019-03-30 | 6:55:05 PM | 1 | 1 | Koski, Jon | 1000030130 | |
| 880073 | 282 | 2019-03-30 | 6:55:17 PM | 1 | 2 | Koski, Jon | 1000030130 | |
| 880074 | 283 | 2019-03-30 | 6:55:31 PM | 1 | 3 | Koski, Jon | 1000030130 |
Connor Lunn Seems incorrect. USC stats show Charles Acker went 8.2 in that game though. Probably just entirely incorrect. https://usctrojans.com/sports/baseball/stats/2019/ucla/boxscore/22237
Tanner Bibee seems CORRECT. HE went 6 inning on that day with 7 ER Koski, Jon is correct. Went 6.1 inning Coates, Chandler is CORRECT. Went 8 innings
Jordan MArks is not correct. MArks only went 6 (105 pitches). Will Wheeler went the last 3 https://upstatespartans.com/sports/baseball/stats/2019/radford/boxscore/630
Max MEyer is correct Went 7 innings 123 pitches that game Kade Strowd is correct Charles HAll is correct 134 pitches Jordan wicks is correct 9 IP 129 pitches JAke AGnos is correct. 129 MAx Meyer 4-26 correct 125 Alek manoah 124, 126 on 5/23 Bryce Elder correct 122 Tanner Bibee correct again 9IP 133 Zack thompson is correct 6.1IP
Kyle Murphy Incorrect. only went 2 innings. Stiehl and josh winkler included in his count https://nuhuskies.com/sports/baseball/stats/2020/alabama/boxscore/10415
Peyton W is correct
Chandler Fochs is weird. THis dataset says his 123rd pitch was in the 4th inning. Brian Rumping took over after 56 from fochs. Definitely check out this game. 20220306-UNCCharlotte-1 https://goleathernecks.com/boxscore.aspx?id=12858&path=baseball
Andrew PAtrick is fine 122 JEff Wilson is fine 9IP 129
The NA’s (GameID 20220318-Lipscomb-1 ) Isaiah Magwood started went 102, reid fagerstrom went 25, trevor andrews 14
miles smith is fine Isaiah coupet is fine 124
Julien hernandez did go 9, said he threw 118 though instead of 122 according to trackman (Maybe it counted warmup pitches?) D’Alessio, Andrew 6.0IP, 120 pitches instead of 125
John Michael Bertrand FINE Cole larsen 9IP. FINE 6.2IP 122 on 4-23 Justin PArker. Assumed FIne Fischer Paulsen. Assumed fine. 7.2IP Ivan MArtinez 6.1 IP 32 batters. Looks fine Jack perkins 128 FINE DAniel Hegarty FINE Joshua South. Assumed Fine
Riley Egloff. ISSUE. He pitched 6 inning 112 pitches. Might include warm up throws or something. https://golobos.com/boxscore/nevada-12/
Paul Skenes 8IP 123 pitches so slightly off. Trackman has 3 or 4 too many Peyton Wiggington 9IP 128 so trackman has two less
Joshua South (5/14) is correct cam reeves assumed correct Taylor GRant correct
Kirschsieper, Cole only threw 98 pitches. Ty Rybarczy came in for 23 and then Alex Vera finished with 51 https://fightingillini.com/sports/baseball/stats/2022/penn-state/boxscore/23558
JAcob cravey only went 6.2. Alex Goff Came in for the last 2.1 https://samfordsports.com/boxscore.aspx?path=baseball&id=12292
Tomasic, Connor only went 6.1 (100 pitches), Kyle Bischoff did the last 2.2 (40 pitches) https://bigten.org/boxscore.aspx?id=jhOYpxOXr63Fu7gkOpr76ZNub%2B51dDNCK2IY59m%2BpdLLZoN7nnqBYyrtcwLzCyhT71VLOajDDnc2bi%2BWgpv7bhQbjnwVxvuPMdtW9hhk%2Bk%2FeQOeE6RumhMwny5z6HwOz&path=baseball
Tyler stultz (5-20) is off by 1 but whatever. (5-26) is correct
Nick Dean only went five inning (87 pitches) Nigel Belgrave did 6 and 7. TOgether they threw 131 so theres ten missing or possibly correctly assigned to Belgrave 20220520-PurdueUniversity-1 https://umterps.com/sports/baseball/stats/2022/purdue/boxscore/12805
Why are most 120+ pitch outings in 2019 and 2022. Are there less pitches in general in 2020 and 2021 because of covid or something?
sqldf("SELECT SUBSTRING(Date, 1, 4) as year, count(*), pitch_group FROM pitches group by SUBSTRING(Date, 1, 4), pitch_group ORDER BY SUBSTRING(Date, 1, 4), pitch_group")
year <chr> | count(*) <int> | pitch_group <chr> | ||
|---|---|---|---|---|
| 8 | 0-9 Pitches | |||
| 7 | 10-19 Pitches | |||
| 10 | 20-29 Pitches | |||
| 3 | 30-39 Pitches | |||
| 1 | 40-49 Pitches | |||
| 3 | 50-59 Pitches | |||
| 1 | 60-69 Pitches | |||
| 3 | 80-89 Pitches | |||
| 6 | > 100 Pitches | |||
| 2019 | 52419 | 0-9 Pitches |
sqldf("SELECT SUBSTRING(Date, 1, 4) as year, count(*) FROM pitches group by SUBSTRING(Date, 1, 4) ORDER BY SUBSTRING(Date, 1, 4)")
year <chr> | count(*) <int> | |||
|---|---|---|---|---|
| 42 | ||||
| 2019 | 206946 | |||
| 2020 | 161249 | |||
| 2021 | 8537 | |||
| 2022 | 767681 |
Looks like theres harldy any in 2021. We’ll keep this in mind.
##Fix Mistakes FOund above. Since were fixing stuff, don’t forget to redo the second and third code chunks to get accurate pitch_count and pitch_group columns
Fix conner Lunn’s 3-29-2019 outing. All 125 pitches assigned to him were thrown by Charles Acker. Actually, Charles Acker was in high school in 2019. It seems like the mistake might be in the Box Score on the USC website. Lunn and Acker were both number 35. I’ll come back to this
# Find Charles Ackers PitcherId
pitches[pitches$PitcherTeam == 'USC_UPS' & substr(pitches$Date, 1, 4) == '2019', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 905297 | 905297 | 12 | 2019-04-05 | 6:09:18 PM | 1 | 1 | Marks, Jordan | 1000025130 | |
| 905298 | 905298 | 13 | 2019-04-05 | 6:09:53 PM | 1 | 2 | Marks, Jordan | 1000025130 | |
| 905299 | 905299 | 14 | 2019-04-05 | 6:10:10 PM | 1 | 3 | Marks, Jordan | 1000025130 | |
| 905300 | 905300 | 15 | 2019-04-05 | 6:10:51 PM | 1 | 4 | Marks, Jordan | 1000025130 | |
| 905301 | 905301 | 16 | 2019-04-05 | 6:11:07 PM | 1 | 5 | Marks, Jordan | 1000025130 | |
| 905302 | 905302 | 17 | 2019-04-05 | 6:11:37 PM | 2 | 1 | Marks, Jordan | 1000025130 | |
| 905303 | 905303 | 18 | 2019-04-05 | 6:11:52 PM | 2 | 2 | Marks, Jordan | 1000025130 | |
| 905304 | 905304 | 19 | 2019-04-05 | 6:12:07 PM | 2 | 3 | Marks, Jordan | 1000025130 | |
| 905305 | 905305 | 20 | 2019-04-05 | 6:12:25 PM | 2 | 4 | Marks, Jordan | 1000025130 | |
| 905306 | 905306 | 21 | 2019-04-05 | 6:13:16 PM | 3 | 1 | Marks, Jordan | 1000025130 |
Jordan MArks has 141 pitches in the 4-7-19 game but he only threw 105. We need to give the last 36 to Will Wheeler and reset the pitch count. GameID: 20190405-CarterMemorial-1
#Find Will Wheelers PitcherId
pitches[pitches$Pitcher == 'Wheeler, Will', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | PitcherThrows <chr> | PitcherTeam <chr> |
|---|
pitches[pitches$PitcherTeam == 'USC_UPS' & substr(pitches$Date, 1, 4) == '2019', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 905297 | 905297 | 12 | 2019-04-05 | 6:09:18 PM | 1 | 1 | Marks, Jordan | 1000025130 | |
| 905298 | 905298 | 13 | 2019-04-05 | 6:09:53 PM | 1 | 2 | Marks, Jordan | 1000025130 | |
| 905299 | 905299 | 14 | 2019-04-05 | 6:10:10 PM | 1 | 3 | Marks, Jordan | 1000025130 | |
| 905300 | 905300 | 15 | 2019-04-05 | 6:10:51 PM | 1 | 4 | Marks, Jordan | 1000025130 | |
| 905301 | 905301 | 16 | 2019-04-05 | 6:11:07 PM | 1 | 5 | Marks, Jordan | 1000025130 | |
| 905302 | 905302 | 17 | 2019-04-05 | 6:11:37 PM | 2 | 1 | Marks, Jordan | 1000025130 | |
| 905303 | 905303 | 18 | 2019-04-05 | 6:11:52 PM | 2 | 2 | Marks, Jordan | 1000025130 | |
| 905304 | 905304 | 19 | 2019-04-05 | 6:12:07 PM | 2 | 3 | Marks, Jordan | 1000025130 | |
| 905305 | 905305 | 20 | 2019-04-05 | 6:12:25 PM | 2 | 4 | Marks, Jordan | 1000025130 | |
| 905306 | 905306 | 21 | 2019-04-05 | 6:13:16 PM | 3 | 1 | Marks, Jordan | 1000025130 |
#No games for will wheeeler, I'll create a new pitcherId for him. First I'll make sure it's not being used
pitches[(!(is.na(pitches$PitcherId)) & pitches$PitcherId == 1000025131), ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | PitcherThrows <chr> | PitcherTeam <chr> |
|---|
#1000025131 is good
#Replace PitcherId for GameID, pitch_count, pitcherTeam 105through 141
#Check entire rows first
pitches[pitches$GameID == '20190405-CarterMemorial-1' & pitches$PitcherTeam == 'USC_UPS' & pitches$pitch_count >= 104, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 905519 | 905519 | 234 | 2019-04-05 | 8:18:24 PM | 1 | 1 | Marks, Jordan | 1000025130 | |
| 905520 | 905520 | 235 | 2019-04-05 | 8:18:37 PM | 1 | 2 | Marks, Jordan | 1000025130 | |
| 905521 | 905521 | 236 | 2019-04-05 | 8:18:54 PM | 1 | 3 | Marks, Jordan | 1000025130 | |
| 905522 | 905522 | 237 | 2019-04-05 | 8:19:10 PM | 1 | 4 | Marks, Jordan | 1000025130 | |
| 905523 | 905523 | 238 | 2019-04-05 | 8:19:26 PM | 1 | 5 | Marks, Jordan | 1000025130 | |
| 905524 | 905524 | 239 | 2019-04-05 | 8:20:05 PM | 2 | 1 | Marks, Jordan | 1000025130 | |
| 905525 | 905525 | 240 | 2019-04-05 | 8:20:21 PM | 2 | 2 | Marks, Jordan | 1000025130 | |
| 905526 | 905526 | 241 | 2019-04-05 | 8:20:36 PM | 2 | 3 | Marks, Jordan | 1000025130 | |
| 905527 | 905527 | 242 | 2019-04-05 | 8:21:12 PM | 3 | 1 | Marks, Jordan | 1000025130 | |
| 905528 | 905528 | 243 | 2019-04-05 | 8:21:29 PM | 3 | 2 | Marks, Jordan | 1000025130 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20190405-CarterMemorial-1' & pitches$PitcherTeam == 'USC_UPS' & pitches$pitch_count >= 104] <- 1000025131
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20190405-CarterMemorial-1' & pitches$PitcherTeam == 'USC_UPS' & pitches$pitch_count >= 104] <- 'Wheeler, Will'
Kyle Murphy’s 153 pitches we’re actually 70 for him, 69 for Stiehl, David and hen 16 for Winkler, Josh. GAmeID 20200214-SwellThomasStadium-1
Quite a discrepency between box score and trackman. I’g going to say Murphy got pulled after the 2nd batter in the 3rd inning. Pitch count 68 is first pitch for Stiehl, DAvid. He goes until end of 6th. Josh Winkler starts the sevent at pitch count 137 until the end
# Look at all pitches for this game
pitches[pitches$GameID == '20200214-SwellThomasStadium-1' & pitches$PitcherTeam == 'NOR_HUS', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 980642 | 980642 | 10 | 2020-02-14 | 3:06:26 PM | 1 | 1 | Murphy, Kyle | 8892851 | |
| 980643 | 980643 | 11 | 2020-02-14 | 3:06:44 PM | 1 | 2 | Murphy, Kyle | 8892851 | |
| 980644 | 980644 | 12 | 2020-02-14 | 3:07:27 PM | 2 | 1 | Murphy, Kyle | 8892851 | |
| 980645 | 980645 | 13 | 2020-02-14 | 3:07:40 PM | 2 | 2 | Murphy, Kyle | 8892851 | |
| 980646 | 980646 | 14 | 2020-02-14 | 3:08:01 PM | 2 | 3 | Murphy, Kyle | 8892851 | |
| 980647 | 980647 | 15 | 2020-02-14 | 3:08:31 PM | 3 | 1 | Murphy, Kyle | 8892851 | |
| 980648 | 980648 | 16 | 2020-02-14 | 3:09:11 PM | 3 | 2 | Murphy, Kyle | 8892851 | |
| 980649 | 980649 | 17 | 2020-02-14 | 3:09:45 PM | 3 | 3 | Murphy, Kyle | 8892851 | |
| 980650 | 980650 | 18 | 2020-02-14 | 3:10:08 PM | 3 | 4 | Murphy, Kyle | 8892851 | |
| 980651 | 980651 | 19 | 2020-02-14 | 3:10:31 PM | 3 | 5 | Murphy, Kyle | 8892851 |
#Getting PitcherIds for stiehl and Winkler
pitches[pitches$Pitcher == 'Stiehl, David', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 784980 | 784980 | 1 | 2019-03-01 | 9:59:43 AM | 1 | 1 | Stiehl, David | 673924 | |
| 784981 | 784981 | 2 | 2019-03-01 | 9:59:59 AM | 1 | 2 | Stiehl, David | 673924 | |
| 784982 | 784982 | 3 | 2019-03-01 | 10:00:15 AM | 1 | 3 | Stiehl, David | 673924 | |
| 784983 | 784983 | 4 | 2019-03-01 | 10:00:33 AM | 1 | 4 | Stiehl, David | 673924 | |
| 784984 | 784984 | 5 | 2019-03-01 | 10:00:50 AM | 1 | 5 | Stiehl, David | 673924 | |
| 784985 | 784985 | 6 | 2019-03-01 | 10:01:49 AM | 2 | 1 | Stiehl, David | 673924 | |
| 784986 | 784986 | 7 | 2019-03-01 | 10:02:18 AM | 2 | 2 | Stiehl, David | 673924 | |
| 784987 | 784987 | 8 | 2019-03-01 | 10:03:24 AM | 2 | 3 | Stiehl, David | 673924 | |
| 784988 | 784988 | 9 | 2019-03-01 | 10:03:43 AM | 2 | 4 | Stiehl, David | 673924 | |
| 784989 | 784989 | 10 | 2019-03-01 | 10:04:02 AM | 2 | 5 | Stiehl, David | 673924 |
#673924
pitches[pitches$Pitcher == 'Winkler, Josh', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 785202 | 785202 | 223 | 2019-03-01 | 12:08:21 PM | 4 | 1 | Winkler, Josh | 1000025808 | |
| 785203 | 785203 | 224 | 2019-03-01 | 12:08:59 PM | 5 | 1 | Winkler, Josh | 1000025808 | |
| 785204 | 785204 | 225 | 2019-03-01 | 12:09:21 PM | 5 | 2 | Winkler, Josh | 1000025808 | |
| 785205 | 785205 | 226 | 2019-03-01 | 12:09:42 PM | 5 | 3 | Winkler, Josh | 1000025808 | |
| 785206 | 785206 | 227 | 2019-03-01 | 12:10:08 PM | 5 | 4 | Winkler, Josh | 1000025808 | |
| 785207 | 785207 | 228 | 2019-03-01 | 12:10:31 PM | 5 | 5 | Winkler, Josh | 1000025808 | |
| 785208 | 785208 | 229 | 2019-03-01 | 12:11:02 PM | 5 | 6 | Winkler, Josh | 1000025808 | |
| 785209 | 785209 | 230 | 2019-03-01 | 12:11:34 PM | 6 | 1 | Winkler, Josh | 1000025808 | |
| 785210 | 785210 | 231 | 2019-03-01 | 12:11:54 PM | 6 | 2 | Winkler, Josh | 1000025808 | |
| 785211 | 785211 | 232 | 2019-03-01 | 12:12:27 PM | 6 | 3 | Winkler, Josh | 1000025808 |
#1000025808
#Replace Stiehls pitches' PitcherId for GameID, pitch_count, pitcherTeam 68 through 137
#Check entire rows first
pitches[pitches$GameID == '20200214-SwellThomasStadium-1' & pitches$PitcherTeam == 'NOR_HUS' & pitches$pitch_count >= 68 & pitches$pitch_count <= 136, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 980740 | 980740 | 108 | 2020-02-14 | 3:57:01 PM | 3 | 1 | Murphy, Kyle | 8892851 | |
| 980741 | 980741 | 109 | 2020-02-14 | 3:57:23 PM | 3 | 2 | Murphy, Kyle | 8892851 | |
| 980742 | 980742 | 110 | 2020-02-14 | 3:57:49 PM | 3 | 3 | Murphy, Kyle | 8892851 | |
| 980743 | 980743 | 111 | 2020-02-14 | 3:58:22 PM | 3 | 4 | Murphy, Kyle | 8892851 | |
| 980744 | 980744 | 112 | 2020-02-14 | 3:58:46 PM | 3 | 5 | Murphy, Kyle | 8892851 | |
| 980745 | 980745 | 113 | 2020-02-14 | 3:59:40 PM | 4 | 1 | Murphy, Kyle | 8892851 | |
| 980746 | 980746 | 114 | 2020-02-14 | 4:00:11 PM | 5 | 1 | Murphy, Kyle | 8892851 | |
| 980747 | 980747 | 115 | 2020-02-14 | 4:00:27 PM | 5 | 2 | Murphy, Kyle | 8892851 | |
| 980748 | 980748 | 116 | 2020-02-14 | 4:00:45 PM | 5 | 3 | Murphy, Kyle | 8892851 | |
| 980749 | 980749 | 117 | 2020-02-14 | 4:01:04 PM | 5 | 4 | Murphy, Kyle | 8892851 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20200214-SwellThomasStadium-1' & pitches$PitcherTeam == 'NOR_HUS' & pitches$pitch_count >= 68 & pitches$pitch_count <= 136] <- 673924
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20200214-SwellThomasStadium-1' & pitches$PitcherTeam == 'NOR_HUS' & pitches$pitch_count >= 68 & pitches$pitch_count <= 136] <- 'Stiehl, David'
#Replace Winklers pitches' PitcherId for GameID, pitch_count, pitcherTeam 68 through 137
#Check entire rows first
pitches[pitches$GameID == '20200214-SwellThomasStadium-1' & pitches$PitcherTeam == 'NOR_HUS' & pitches$pitch_count >= 137, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 980885 | 980885 | 253 | 2020-02-14 | 5:14:12 PM | 1 | 1 | Murphy, Kyle | 8892851 | |
| 980886 | 980886 | 254 | 2020-02-14 | 5:14:44 PM | 2 | 1 | Murphy, Kyle | 8892851 | |
| 980887 | 980887 | 255 | 2020-02-14 | 5:15:03 PM | 2 | 2 | Murphy, Kyle | 8892851 | |
| 980888 | 980888 | 256 | 2020-02-14 | 5:15:22 PM | 2 | 3 | Murphy, Kyle | 8892851 | |
| 980889 | 980889 | 257 | 2020-02-14 | 5:15:49 PM | 2 | 4 | Murphy, Kyle | 8892851 | |
| 980890 | 980890 | 258 | 2020-02-14 | 5:16:23 PM | 2 | 5 | Murphy, Kyle | 8892851 | |
| 980891 | 980891 | 259 | 2020-02-14 | 5:16:49 PM | 2 | 6 | Murphy, Kyle | 8892851 | |
| 980892 | 980892 | 260 | 2020-02-14 | 5:17:26 PM | 3 | 1 | Murphy, Kyle | 8892851 | |
| 980893 | 980893 | 261 | 2020-02-14 | 5:17:47 PM | 3 | 2 | Murphy, Kyle | 8892851 | |
| 980894 | 980894 | 262 | 2020-02-14 | 5:18:11 PM | 3 | 3 | Murphy, Kyle | 8892851 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20200214-SwellThomasStadium-1' & pitches$PitcherTeam == 'NOR_HUS' & pitches$pitch_count >= 137] <- 1000025808
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20200214-SwellThomasStadium-1' & pitches$PitcherTeam == 'NOR_HUS' & pitches$pitch_count >= 137] <- 'Winkler, Josh'
Fix Chandler Fochs, GAmeID: 20220306-UNCCharlotte-1 Chandler: 2.0IP, 56 pitches, Rumping, Bryan 1.2IP, 67 pitches; Jaynes, Will 1.0IP 47 pirches; Kratz, Caden 1.1, 27
CHandler pulled after 56. Give Pitch_count 57 through upped bound to Rumping, Bryan
# Look at all pitches for this game
pitches[pitches$GameID == '20220306-UNCCharlotte-1' & pitches$PitcherTeam == 'WIU_LEA', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 144847 | 144847 | 23 | 2022-03-06 | 12:13:21.49 | 1 | 1 | Fochs, Chandler | 1000057813 | |
| 144848 | 144848 | 24 | 2022-03-06 | 12:13:40.86 | 1 | 2 | Fochs, Chandler | 1000057813 | |
| 144849 | 144849 | 25 | 2022-03-06 | 12:13:55.48 | 1 | 3 | Fochs, Chandler | 1000057813 | |
| 144850 | 144850 | 26 | 2022-03-06 | 12:14:10.02 | 1 | 4 | Fochs, Chandler | 1000057813 | |
| 144851 | 144851 | 27 | 2022-03-06 | 12:14:31.38 | 1 | 5 | Fochs, Chandler | 1000057813 | |
| 144852 | 144852 | 28 | 2022-03-06 | 12:14:49.01 | 1 | 6 | Fochs, Chandler | 1000057813 | |
| 144853 | 144853 | 29 | 2022-03-06 | 12:15:29.42 | 2 | 1 | Fochs, Chandler | 1000057813 | |
| 144854 | 144854 | 30 | 2022-03-06 | 12:15:44.77 | 2 | 2 | Fochs, Chandler | 1000057813 | |
| 144855 | 144855 | 31 | 2022-03-06 | 12:16:05.45 | 2 | 3 | Fochs, Chandler | 1000057813 | |
| 144856 | 144856 | 32 | 2022-03-06 | 12:16:22.70 | 2 | 4 | Fochs, Chandler | 1000057813 |
#Find Rumping's pitcherId
pitches[pitches$Pitcher == 'Rumping, Bryan', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 606237 | 606237 | 268 | 2022-05-03 | 20:35:12.78 | 1 | 1 | Rumping, Bryan | 1000076881 | |
| 606238 | 606238 | 269 | 2022-05-03 | 20:35:24.82 | 1 | 2 | Rumping, Bryan | 1000076881 | |
| 606239 | 606239 | 270 | 2022-05-03 | 20:35:40.49 | 1 | 3 | Rumping, Bryan | 1000076881 | |
| 606240 | 606240 | 271 | 2022-05-03 | 20:36:34.37 | 2 | 1 | Rumping, Bryan | 1000076881 | |
| 606241 | 606241 | 272 | 2022-05-03 | 20:37:35.26 | 3 | 1 | Rumping, Bryan | 1000076881 | |
| 606242 | 606242 | 273 | 2022-05-03 | 20:38:15.35 | 4 | 1 | Rumping, Bryan | 1000076881 | |
| 606243 | 606243 | 274 | 2022-05-03 | 20:38:27.43 | 4 | 2 | Rumping, Bryan | 1000076881 | |
| 606244 | 606244 | 275 | 2022-05-03 | 20:38:40.08 | 4 | 3 | Rumping, Bryan | 1000076881 |
#1000076881
#Replace PitcherId for GameID, pitch_count, pitcherTeam 57 and up
#Check entire rows first
pitches[pitches$GameID == '20220306-UNCCharlotte-1' & pitches$PitcherTeam == 'WIU_LEA' & pitches$pitch_count >= 57, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 144918 | 144918 | 94 | 2022-03-06 | 12:54:27.94 | 1 | 1 | Fochs, Chandler | 1000057813 | |
| 144919 | 144919 | 95 | 2022-03-06 | 12:54:42.86 | 1 | 2 | Fochs, Chandler | 1000057813 | |
| 144920 | 144920 | 96 | 2022-03-06 | 12:54:59.25 | 1 | 3 | Fochs, Chandler | 1000057813 | |
| 144921 | 144921 | 97 | 2022-03-06 | 12:55:18.35 | 1 | 4 | Fochs, Chandler | 1000057813 | |
| 144922 | 144922 | 98 | 2022-03-06 | 12:55:48.11 | 1 | 5 | Fochs, Chandler | 1000057813 | |
| 144923 | 144923 | 99 | 2022-03-06 | 12:56:04.36 | 1 | 6 | Fochs, Chandler | 1000057813 | |
| 144924 | 144924 | 100 | 2022-03-06 | 12:56:38.99 | 2 | 1 | Fochs, Chandler | 1000057813 | |
| 144925 | 144925 | 101 | 2022-03-06 | 12:56:56.79 | 2 | 2 | Fochs, Chandler | 1000057813 | |
| 144926 | 144926 | 102 | 2022-03-06 | 12:57:15.14 | 2 | 3 | Fochs, Chandler | 1000057813 | |
| 144927 | 144927 | 103 | 2022-03-06 | 12:57:36.18 | 2 | 4 | Fochs, Chandler | 1000057813 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220306-UNCCharlotte-1' & pitches$PitcherTeam == 'WIU_LEA' & pitches$pitch_count >= 57] <- 1000076881
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220306-UNCCharlotte-1' & pitches$PitcherTeam == 'WIU_LEA' & pitches$pitch_count >= 57] <- 'Rumping, Bryan'
Fix the NA pitcher Ids for this game 20220318-Lipscomb-1
# Look at all pitches for this game
pitches[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | PitcherThrows <chr> | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 224088 | 224088 | 1 | 2022-03-18 | 15:04:22.76 | 1 | 1 | NA | Right | ||
| 224089 | 224089 | 2 | 2022-03-18 | 15:04:43.93 | 1 | 2 | NA | Right | ||
| 224090 | 224090 | 3 | 2022-03-18 | 15:05:02.39 | 1 | 3 | NA | Right | ||
| 224091 | 224091 | 4 | 2022-03-18 | 15:05:34.89 | 2 | 1 | NA | Right | ||
| 224092 | 224092 | 5 | 2022-03-18 | 15:05:47.51 | 2 | 2 | NA | Right | ||
| 224093 | 224093 | 6 | 2022-03-18 | 15:06:08.21 | 2 | 3 | NA | Right | ||
| 224094 | 224094 | 7 | 2022-03-18 | 15:06:31.72 | 2 | 4 | NA | Right | ||
| 224095 | 224095 | 8 | 2022-03-18 | 15:06:53.41 | 2 | 5 | NA | Right | ||
| 224096 | 224096 | 9 | 2022-03-18 | 15:07:09.52 | 2 | 6 | NA | Right | ||
| 224097 | 224097 | 10 | 2022-03-18 | 15:07:48.75 | 3 | 1 | NA | Right |
#Getting PitcherIds for stiehl and Winkler
pitches[pitches$Pitcher == 'Kantola, Kaleb', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 5157 | 5157 | 1 | 2022-02-18 | 14:06:25.23 | 1 | 1 | Kantola, Kaleb | 1000094167 | |
| 5158 | 5158 | 2 | 2022-02-18 | 14:06:42.17 | 1 | 2 | Kantola, Kaleb | 1000094167 | |
| 5159 | 5159 | 3 | 2022-02-18 | 14:07:04.92 | 1 | 3 | Kantola, Kaleb | 1000094167 | |
| 5160 | 5160 | 4 | 2022-02-18 | 14:07:22.59 | 1 | 4 | Kantola, Kaleb | 1000094167 | |
| 5161 | 5161 | 5 | 2022-02-18 | 14:07:49.58 | 1 | 5 | Kantola, Kaleb | 1000094167 | |
| 5162 | 5162 | 6 | 2022-02-18 | 14:08:15.73 | 1 | 6 | Kantola, Kaleb | 1000094167 | |
| 5163 | 5163 | 7 | 2022-02-18 | 14:08:37.86 | 1 | 7 | Kantola, Kaleb | 1000094167 | |
| 5164 | 5164 | 8 | 2022-02-18 | 14:09:17.73 | 2 | 1 | Kantola, Kaleb | 1000094167 | |
| 5165 | 5165 | 9 | 2022-02-18 | 14:09:36.61 | 2 | 2 | Kantola, Kaleb | 1000094167 | |
| 5166 | 5166 | 10 | 2022-02-18 | 14:10:02.62 | 2 | 3 | Kantola, Kaleb | 1000094167 |
#1000094167
pitches[pitches$Pitcher == 'Williams, Patrick', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 96294 | 96294 | 108 | 2022-03-02 | 16:11:01.94 | 5 | 1 | Williams, Patrick | 1000085191 | |
| 96295 | 96295 | 109 | 2022-03-02 | 16:11:18.76 | 5 | 2 | Williams, Patrick | 1000085191 | |
| 96296 | 96296 | 110 | 2022-03-02 | 16:11:43.41 | 5 | 3 | Williams, Patrick | 1000085191 | |
| 96297 | 96297 | 111 | 2022-03-02 | 16:12:03.18 | 5 | 4 | Williams, Patrick | 1000085191 | |
| 96298 | 96298 | 112 | 2022-03-02 | 16:12:23.51 | 5 | 5 | Williams, Patrick | 1000085191 | |
| 96306 | 96306 | 120 | 2022-03-02 | 16:19:24.90 | 1 | 1 | Williams, Patrick | 1000085191 | |
| 96307 | 96307 | 121 | 2022-03-02 | 16:19:37.76 | 1 | 2 | Williams, Patrick | 1000085191 | |
| 96308 | 96308 | 122 | 2022-03-02 | 16:20:16.00 | 2 | 1 | Williams, Patrick | 1000085191 | |
| 96309 | 96309 | 123 | 2022-03-02 | 16:20:46.75 | 3 | 1 | Williams, Patrick | 1000085191 | |
| 96330 | 96330 | 144 | 2022-03-02 | 16:32:19.57 | 1 | 1 | Williams, Patrick | 1000085191 |
#1000085191
pitches[pitches$Pitcher == 'Newell, Will', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 18121 | 18121 | 298 | 2022-02-19 | 17:04:37.46 | 1 | 1 | Newell, Will | 1000106568 | |
| 18122 | 18122 | 299 | 2022-02-19 | 17:04:53.92 | 1 | 2 | Newell, Will | 1000106568 | |
| 18123 | 18123 | 300 | 2022-02-19 | 17:05:15.88 | 1 | 3 | Newell, Will | 1000106568 | |
| 18124 | 18124 | 301 | 2022-02-19 | 17:05:47.51 | 2 | 1 | Newell, Will | 1000106568 | |
| 18125 | 18125 | 302 | 2022-02-19 | 17:06:01.69 | 2 | 2 | Newell, Will | 1000106568 | |
| 18126 | 18126 | 303 | 2022-02-19 | 17:06:23.00 | 2 | 3 | Newell, Will | 1000106568 | |
| 18127 | 18127 | 304 | 2022-02-19 | 17:06:51.56 | 2 | 4 | Newell, Will | 1000106568 | |
| 18128 | 18128 | 305 | 2022-02-19 | 17:07:27.35 | 3 | 1 | Newell, Will | 1000106568 | |
| 18129 | 18129 | 306 | 2022-02-19 | 17:07:43.50 | 3 | 2 | Newell, Will | 1000106568 | |
| 18130 | 18130 | 307 | 2022-02-19 | 17:08:00.22 | 3 | 3 | Newell, Will | 1000106568 |
#1000106568
#Give first 68 piches to Kantola, Kaleb
#Check entire rows first
pitches[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count <= 68, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | PitcherThrows <chr> | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 224088 | 224088 | 1 | 2022-03-18 | 15:04:22.76 | 1 | 1 | NA | Right | ||
| 224089 | 224089 | 2 | 2022-03-18 | 15:04:43.93 | 1 | 2 | NA | Right | ||
| 224090 | 224090 | 3 | 2022-03-18 | 15:05:02.39 | 1 | 3 | NA | Right | ||
| 224091 | 224091 | 4 | 2022-03-18 | 15:05:34.89 | 2 | 1 | NA | Right | ||
| 224092 | 224092 | 5 | 2022-03-18 | 15:05:47.51 | 2 | 2 | NA | Right | ||
| 224093 | 224093 | 6 | 2022-03-18 | 15:06:08.21 | 2 | 3 | NA | Right | ||
| 224094 | 224094 | 7 | 2022-03-18 | 15:06:31.72 | 2 | 4 | NA | Right | ||
| 224095 | 224095 | 8 | 2022-03-18 | 15:06:53.41 | 2 | 5 | NA | Right | ||
| 224096 | 224096 | 9 | 2022-03-18 | 15:07:09.52 | 2 | 6 | NA | Right | ||
| 224097 | 224097 | 10 | 2022-03-18 | 15:07:48.75 | 3 | 1 | NA | Right |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count <= 68] <- 1000094167
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count <= 68] <- 'Kantola, Kaleb'
#Replace NA pitches' PitcherId for GameID, pitch_count, pitcherTeam 69 through 99
#Check entire rows first
pitches[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count > 68 & pitches$pitch_count <= 99, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | PitcherThrows <chr> | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 224218 | 224218 | 131 | 2022-03-18 | 16:12:36.47 | 1 | 1 | NA | Right | ||
| 224219 | 224219 | 132 | 2022-03-18 | 16:12:49.09 | 1 | 2 | NA | Right | ||
| 224220 | 224220 | 133 | 2022-03-18 | 16:13:08.12 | 1 | 3 | NA | Right | ||
| 224221 | 224221 | 134 | 2022-03-18 | 16:13:20.81 | 1 | 4 | NA | Right | ||
| 224222 | 224222 | 135 | 2022-03-18 | 16:13:38.44 | 1 | 5 | NA | Right | ||
| 224223 | 224223 | 136 | 2022-03-18 | 16:13:54.82 | 1 | 6 | NA | Right | ||
| 224224 | 224224 | 137 | 2022-03-18 | 16:14:35.45 | 2 | 1 | NA | Right | ||
| 224225 | 224225 | 138 | 2022-03-18 | 16:14:54.98 | 2 | 2 | NA | Right | ||
| 224226 | 224226 | 139 | 2022-03-18 | 16:15:16.94 | 2 | 3 | NA | Right | ||
| 224227 | 224227 | 140 | 2022-03-18 | 16:15:38.72 | 2 | 4 | NA | Right |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count > 68 & pitches$pitch_count <= 99] <- 1000085191
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count > 68 & pitches$pitch_count <= 99] <- 'Williams, Patrick'
#Replace last 50 or so pitcher ids with Newells
#Check entire rows first
pitches[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count >= 100, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | PitcherThrows <chr> | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 224290 | 224290 | 203 | 2022-03-18 | 16:51:19.78 | 1 | 1 | NA | Right | ||
| 224291 | 224291 | 204 | 2022-03-18 | 16:51:31.94 | 1 | 2 | NA | Right | ||
| 224292 | 224292 | 205 | 2022-03-18 | 16:51:50.75 | 1 | 3 | NA | Right | ||
| 224293 | 224293 | 206 | 2022-03-18 | 16:52:04.27 | 1 | 4 | NA | Right | ||
| 224294 | 224294 | 207 | 2022-03-18 | 16:52:53.76 | 2 | 1 | NA | Right | ||
| 224295 | 224295 | 208 | 2022-03-18 | 16:54:51.08 | 3 | 1 | NA | Right | ||
| 224296 | 224296 | 209 | 2022-03-18 | 16:55:44.12 | 4 | 1 | NA | Right | ||
| 224297 | 224297 | 210 | 2022-03-18 | 16:56:03.63 | 4 | 2 | NA | Right | ||
| 224298 | 224298 | 211 | 2022-03-18 | 16:56:27.61 | 4 | 3 | NA | Right | ||
| 224299 | 224299 | 212 | 2022-03-18 | 16:56:52.87 | 5 | 1 | NA | Right |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count >= 100] <- 1000106568
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220318-Lipscomb-1' & pitches$PitcherTeam == 'LIP_PRA' & pitches$pitch_count >= 100] <- 'Newell, Will'
20220519-LubranoPark-1. COle got given way too many. 2 other guys came in. Ty Rybarczy came in for 23 and then Alex Vera finished with 51. Only 92 for cole
# Look at all pitches for this game
pitches[pitches$GameID == '20220519-LubranoPark-1' & pitches$PitcherTeam == 'ILL_ILL', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 726427 | 726427 | 21 | 2022-05-19 | 17:14:27.80 | 1 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726428 | 726428 | 22 | 2022-05-19 | 17:14:40.74 | 1 | 2 | Kirschsieper, Cole | 1000050785 | |
| 726429 | 726429 | 23 | 2022-05-19 | 17:14:53.58 | 1 | 3 | Kirschsieper, Cole | 1000050785 | |
| 726430 | 726430 | 24 | 2022-05-19 | 17:15:48.37 | 2 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726431 | 726431 | 25 | 2022-05-19 | 17:16:12.59 | 2 | 2 | Kirschsieper, Cole | 1000050785 | |
| 726432 | 726432 | 26 | 2022-05-19 | 17:16:43.71 | 2 | 3 | Kirschsieper, Cole | 1000050785 | |
| 726433 | 726433 | 27 | 2022-05-19 | 17:17:17.47 | 2 | 4 | Kirschsieper, Cole | 1000050785 | |
| 726434 | 726434 | 28 | 2022-05-19 | 17:18:05.89 | 3 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726435 | 726435 | 29 | 2022-05-19 | 17:18:38.32 | 3 | 2 | Kirschsieper, Cole | 1000050785 | |
| 726436 | 726436 | 30 | 2022-05-19 | 17:19:31.13 | 4 | 1 | Kirschsieper, Cole | 1000050785 |
#Ty Rybarczy id
pitches[pitches$Pitcher == 'Rybarczyk, Ty', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 343125 | 343125 | 218 | 2022-04-01 | 19:59:11.90 | 1 | 1 | Rybarczyk, Ty | 1000084463 | |
| 343126 | 343126 | 219 | 2022-04-01 | 19:59:32.14 | 1 | 2 | Rybarczyk, Ty | 1000084463 | |
| 343127 | 343127 | 220 | 2022-04-01 | 19:59:49.79 | 1 | 3 | Rybarczyk, Ty | 1000084463 | |
| 343128 | 343128 | 221 | 2022-04-01 | 20:00:13.32 | 1 | 4 | Rybarczyk, Ty | 1000084463 | |
| 343129 | 343129 | 222 | 2022-04-01 | 20:00:37.17 | 1 | 5 | Rybarczyk, Ty | 1000084463 | |
| 343130 | 343130 | 223 | 2022-04-01 | 20:01:09.96 | 2 | 1 | Rybarczyk, Ty | 1000084463 | |
| 343131 | 343131 | 224 | 2022-04-01 | 20:02:20.86 | 3 | 1 | Rybarczyk, Ty | 1000084463 | |
| 343132 | 343132 | 225 | 2022-04-01 | 20:02:46.99 | 3 | 2 | Rybarczyk, Ty | 1000084463 | |
| 343133 | 343133 | 226 | 2022-04-01 | 20:03:09.98 | 3 | 3 | Rybarczyk, Ty | 1000084463 | |
| 343134 | 343134 | 227 | 2022-04-01 | 20:03:55.96 | 4 | 1 | Rybarczyk, Ty | 1000084463 |
#1000084463
#Alex Vera ID
pitches[pitches$Pitcher == 'Vera, Alex', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 67540 | 67540 | 83 | 2022-02-26 | 19:31:11.59 | 6 | 1 | Vera, Alex | 1000050778 | |
| 67541 | 67541 | 84 | 2022-02-26 | 19:31:32.44 | 6 | 2 | Vera, Alex | 1000050778 | |
| 67542 | 67542 | 85 | 2022-02-26 | 19:31:53.82 | 6 | 3 | Vera, Alex | 1000050778 | |
| 67543 | 67543 | 86 | 2022-02-26 | 19:32:13.43 | 6 | 4 | Vera, Alex | 1000050778 | |
| 67544 | 67544 | 87 | 2022-02-26 | 19:32:33.93 | 6 | 5 | Vera, Alex | 1000050778 | |
| 67545 | 67545 | 88 | 2022-02-26 | 19:32:57.09 | 6 | 6 | Vera, Alex | 1000050778 | |
| 67546 | 67546 | 89 | 2022-02-26 | 19:33:46.27 | 7 | 1 | Vera, Alex | 1000050778 | |
| 67547 | 67547 | 90 | 2022-02-26 | 19:34:06.93 | 7 | 2 | Vera, Alex | 1000050778 | |
| 67548 | 67548 | 91 | 2022-02-26 | 19:34:29.90 | 7 | 3 | Vera, Alex | 1000050778 | |
| 67549 | 67549 | 92 | 2022-02-26 | 19:34:50.48 | 7 | 4 | Vera, Alex | 1000050778 |
#1000050778
#GIve TY all the pitches starting at pitch_count 94 up to 117
#Check entire rows first
pitches[pitches$GameID == '20220519-LubranoPark-1' & pitches$PitcherTeam == 'ILL_ILL' & pitches$pitch_count >= 94 & pitches$pitch_count <= 117, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 726603 | 726603 | 197 | 2022-05-19 | 18:55:32.95 | 9 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726604 | 726604 | 198 | 2022-05-19 | 18:56:23.32 | 10 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726605 | 726605 | 199 | 2022-05-19 | 18:56:46.96 | 10 | 2 | Kirschsieper, Cole | 1000050785 | |
| 726606 | 726606 | 200 | 2022-05-19 | 18:57:08.76 | 10 | 3 | Kirschsieper, Cole | 1000050785 | |
| 726607 | 726607 | 201 | 2022-05-19 | 18:57:29.42 | 10 | 4 | Kirschsieper, Cole | 1000050785 | |
| 726608 | 726608 | 202 | 2022-05-19 | 18:57:58.24 | 10 | 5 | Kirschsieper, Cole | 1000050785 | |
| 726624 | 726624 | 218 | 2022-05-19 | 19:08:11.34 | 1 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726625 | 726625 | 219 | 2022-05-19 | 19:08:31.41 | 1 | 2 | Kirschsieper, Cole | 1000050785 | |
| 726626 | 726626 | 220 | 2022-05-19 | 19:09:05.17 | 2 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726627 | 726627 | 221 | 2022-05-19 | 19:09:22.54 | 2 | 2 | Kirschsieper, Cole | 1000050785 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220519-LubranoPark-1' & pitches$PitcherTeam == 'ILL_ILL' & pitches$pitch_count >= 94 & pitches$pitch_count <= 117] <- 1000084463
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220519-LubranoPark-1' & pitches$PitcherTeam == 'ILL_ILL' & pitches$pitch_count >= 94 & pitches$pitch_count <= 117] <- 'Rybarczyk, Ty'
#Give Alex vera everything above 117
#Check entire rows first
pitches[pitches$GameID == '20220519-LubranoPark-1' & pitches$PitcherTeam == 'ILL_ILL' & pitches$pitch_count > 117, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 726661 | 726661 | 255 | 2022-05-19 | 19:35:24.09 | 1 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726662 | 726662 | 256 | 2022-05-19 | 19:35:39.37 | 1 | 2 | Kirschsieper, Cole | 1000050785 | |
| 726663 | 726663 | 257 | 2022-05-19 | 19:36:01.18 | 1 | 3 | Kirschsieper, Cole | 1000050785 | |
| 726664 | 726664 | 258 | 2022-05-19 | 19:36:36.78 | 2 | 1 | Kirschsieper, Cole | 1000050785 | |
| 726665 | 726665 | 259 | 2022-05-19 | 19:36:48.89 | 2 | 2 | Kirschsieper, Cole | 1000050785 | |
| 726666 | 726666 | 260 | 2022-05-19 | 19:37:04.51 | 2 | 3 | Kirschsieper, Cole | 1000050785 | |
| 726667 | 726667 | 261 | 2022-05-19 | 19:37:18.38 | 2 | 4 | Kirschsieper, Cole | 1000050785 | |
| 726668 | 726668 | 262 | 2022-05-19 | 19:37:40.17 | 2 | 5 | Kirschsieper, Cole | 1000050785 | |
| 726669 | 726669 | 263 | 2022-05-19 | 19:37:54.80 | 2 | 6 | Kirschsieper, Cole | 1000050785 | |
| 726670 | 726670 | 264 | 2022-05-19 | 19:38:13.66 | 2 | 7 | Kirschsieper, Cole | 1000050785 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220519-LubranoPark-1' & pitches$PitcherTeam == 'ILL_ILL' & pitches$pitch_count > 117] <- 1000050778
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220519-LubranoPark-1' & pitches$PitcherTeam == 'ILL_ILL' & pitches$pitch_count > 117] <- 'Vera, Alex'
20220519-SamfordUniversity-2. Cravey went 6.2IP 29BF, Goff, Alex went 2.1 9BF
# Look at all pitches for this game
pitches[pitches$GameID == '20220519-SamfordUniversity-2' & pitches$PitcherTeam == 'SAM_BUL', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 728744 | 728744 | 1 | 2022-05-19 | 16:03:01.01 | 1 | 1 | Cravey, Jacob | 1000101505 | |
| 728745 | 728745 | 2 | 2022-05-19 | 16:03:12.97 | 1 | 2 | Cravey, Jacob | 1000101505 | |
| 728746 | 728746 | 3 | 2022-05-19 | 16:03:26.05 | 1 | 3 | Cravey, Jacob | 1000101505 | |
| 728747 | 728747 | 4 | 2022-05-19 | 16:03:55.94 | 2 | 1 | Cravey, Jacob | 1000101505 | |
| 728748 | 728748 | 5 | 2022-05-19 | 16:04:13.76 | 2 | 2 | Cravey, Jacob | 1000101505 | |
| 728749 | 728749 | 6 | 2022-05-19 | 16:04:52.92 | 3 | 1 | Cravey, Jacob | 1000101505 | |
| 728750 | 728750 | 7 | 2022-05-19 | 16:05:06.51 | 3 | 2 | Cravey, Jacob | 1000101505 | |
| 728751 | 728751 | 8 | 2022-05-19 | 16:05:22.33 | 3 | 3 | Cravey, Jacob | 1000101505 | |
| 728752 | 728752 | 9 | 3 | 4 | Cravey, Jacob | 1000101505 | |||
| 728786 | 728786 | 43 | 1 | 1 | Cravey, Jacob | 1000101505 |
#Alex Goff id
pitches[pitches$Pitcher == 'Goff, Alex', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 98184 | 98184 | 223 | 2022-03-02 | 18:10:59.41 | 5 | 1 | Goff, Alex | 1000057623 | |
| 98185 | 98185 | 224 | 2022-03-02 | 18:11:20.15 | 5 | 2 | Goff, Alex | 1000057623 | |
| 98186 | 98186 | 225 | 2022-03-02 | 18:11:39.67 | 5 | 3 | Goff, Alex | 1000057623 | |
| 98187 | 98187 | 226 | 2022-03-02 | 18:12:04.01 | 5 | 4 | Goff, Alex | 1000057623 | |
| 98188 | 98188 | 227 | 2022-03-02 | 18:12:47.72 | 5 | 5 | Goff, Alex | 1000057623 | |
| 98189 | 98189 | 228 | 2022-03-02 | 18:13:24.18 | 5 | 6 | Goff, Alex | 1000057623 | |
| 98190 | 98190 | 229 | 2022-03-02 | 18:13:58.93 | 6 | 1 | Goff, Alex | 1000057623 | |
| 98191 | 98191 | 230 | 2022-03-02 | 18:14:21.61 | 6 | 2 | Goff, Alex | 1000057623 | |
| 98192 | 98192 | 231 | 2022-03-02 | 18:14:49.28 | 6 | 3 | Goff, Alex | 1000057623 | |
| 98193 | 98193 | 232 | 2022-03-02 | 18:15:08.36 | 6 | 4 | Goff, Alex | 1000057623 |
#1000057623
#Evrything 98 and up goes to Goff
#Check entire rows first
pitches[pitches$GameID == '20220519-SamfordUniversity-2' & pitches$PitcherTeam == 'SAM_BUL' & pitches$pitch_count > 97, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 728984 | 728984 | 241 | 2022-05-19 | 18:23:36.60 | 6 | 1 | Cravey, Jacob | 1000101505 | |
| 728985 | 728985 | 242 | 2022-05-19 | 18:24:12.14 | 6 | 2 | Cravey, Jacob | 1000101505 | |
| 728986 | 728986 | 243 | 2022-05-19 | 18:24:31.98 | 6 | 3 | Cravey, Jacob | 1000101505 | |
| 728987 | 728987 | 244 | 2022-05-19 | 18:24:57.01 | 6 | 4 | Cravey, Jacob | 1000101505 | |
| 728988 | 728988 | 245 | 2022-05-19 | 18:25:22.83 | 6 | 5 | Cravey, Jacob | 1000101505 | |
| 728997 | 728997 | 254 | 2022-05-19 | 18:33:18.37 | 1 | 1 | Cravey, Jacob | 1000101505 | |
| 728998 | 728998 | 255 | 2022-05-19 | 18:33:33.67 | 1 | 2 | Cravey, Jacob | 1000101505 | |
| 728999 | 728999 | 256 | 2022-05-19 | 18:34:11.64 | 2 | 1 | Cravey, Jacob | 1000101505 | |
| 729000 | 729000 | 257 | 2022-05-19 | 18:34:24.38 | 2 | 2 | Cravey, Jacob | 1000101505 | |
| 729001 | 729001 | 258 | 2022-05-19 | 18:34:39.28 | 2 | 3 | Cravey, Jacob | 1000101505 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220519-SamfordUniversity-2' & pitches$PitcherTeam == 'SAM_BUL' & pitches$pitch_count > 97] <- 1000057623
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220519-SamfordUniversity-2' & pitches$PitcherTeam == 'SAM_BUL' & pitches$pitch_count > 97] <- 'Goff, Alex'
Tomasic, Connor only went 6.1 (100 pitches), Kyle Bischoff did the last 2.2 (40 pitches)
# Look at all pitches for this game
pitches[pitches$GameID == '20220519-UNebraska-1' & pitches$PitcherTeam == 'MIC_SPA', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 731988 | 731988 | 17 | 2022-05-19 | 18:43:48.61 | 1 | 1 | Tomasic, Connor | 1000113049 | |
| 731989 | 731989 | 18 | 2022-05-19 | 18:44:04.94 | 1 | 2 | Tomasic, Connor | 1000113049 | |
| 731990 | 731990 | 19 | 2022-05-19 | 18:44:19.22 | 1 | 3 | Tomasic, Connor | 1000113049 | |
| 731991 | 731991 | 20 | 2022-05-19 | 18:44:33.57 | 1 | 4 | Tomasic, Connor | 1000113049 | |
| 731992 | 731992 | 21 | 2022-05-19 | 18:45:01.95 | 2 | 1 | Tomasic, Connor | 1000113049 | |
| 731993 | 731993 | 22 | 2022-05-19 | 18:45:16.92 | 2 | 2 | Tomasic, Connor | 1000113049 | |
| 731994 | 731994 | 23 | 2022-05-19 | 18:45:37.42 | 2 | 3 | Tomasic, Connor | 1000113049 | |
| 731995 | 731995 | 24 | 2022-05-19 | 18:45:55.70 | 2 | 4 | Tomasic, Connor | 1000113049 | |
| 731996 | 731996 | 25 | 2022-05-19 | 18:46:18.90 | 2 | 5 | Tomasic, Connor | 1000113049 | |
| 731997 | 731997 | 26 | 2022-05-19 | 18:47:03.18 | 3 | 1 | Tomasic, Connor | 1000113049 |
#Kyle's id
pitches[pitches$Pitcher == 'Bischoff, Kyle', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 28820 | 28820 | 267 | 2022-02-20 | 14:53:26.42 | 1 | 1 | Bischoff, Kyle | 1000014400 | |
| 28821 | 28821 | 268 | 2022-02-20 | 14:53:40.36 | 1 | 2 | Bischoff, Kyle | 1000014400 | |
| 28822 | 28822 | 269 | 2022-02-20 | 14:54:12.38 | 1 | 3 | Bischoff, Kyle | 1000014400 | |
| 28823 | 28823 | 270 | 2022-02-20 | 14:54:52.86 | 1 | 4 | Bischoff, Kyle | 1000014400 | |
| 28824 | 28824 | 271 | 2022-02-20 | 14:55:08.34 | 1 | 5 | Bischoff, Kyle | 1000014400 | |
| 28825 | 28825 | 272 | 2022-02-20 | 14:55:26.24 | 1 | 6 | Bischoff, Kyle | 1000014400 | |
| 28826 | 28826 | 273 | 2022-02-20 | 14:56:21.69 | 2 | 1 | Bischoff, Kyle | 1000014400 | |
| 28827 | 28827 | 274 | 2022-02-20 | 14:58:03.08 | 3 | 1 | Bischoff, Kyle | 1000014400 | |
| 28828 | 28828 | 275 | 2022-02-20 | 14:58:55.12 | 4 | 1 | Bischoff, Kyle | 1000014400 | |
| 28829 | 28829 | 276 | 2022-02-20 | 14:59:13.67 | 4 | 2 | Bischoff, Kyle | 1000014400 |
#1000014400
#Evrything 78 and up goes to Goff
#Check entire rows first
pitches[pitches$GameID == '20220519-UNebraska-1' & pitches$PitcherTeam == 'MIC_SPA' & pitches$pitch_count > 100, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 732182 | 732182 | 211 | 2022-05-19 | 20:29:44.33 | 4 | 2 | Tomasic, Connor | 1000113049 | |
| 732197 | 732197 | 226 | 2022-05-19 | 20:40:48.59 | 1 | 1 | Tomasic, Connor | 1000113049 | |
| 732198 | 732198 | 227 | 2022-05-19 | 20:41:02.96 | 1 | 2 | Tomasic, Connor | 1000113049 | |
| 732199 | 732199 | 228 | 2022-05-19 | 20:41:17.56 | 1 | 3 | Tomasic, Connor | 1000113049 | |
| 732200 | 732200 | 229 | 2022-05-19 | 20:41:36.40 | 1 | 4 | Tomasic, Connor | 1000113049 | |
| 732201 | 732201 | 230 | 2022-05-19 | 20:41:56.36 | 1 | 5 | Tomasic, Connor | 1000113049 | |
| 732202 | 732202 | 231 | 2022-05-19 | 20:42:42.45 | 2 | 1 | Tomasic, Connor | 1000113049 | |
| 732203 | 732203 | 232 | 2022-05-19 | 20:43:02.84 | 2 | 2 | Tomasic, Connor | 1000113049 | |
| 732204 | 732204 | 233 | 2022-05-19 | 20:43:46.17 | 3 | 1 | Tomasic, Connor | 1000113049 | |
| 732205 | 732205 | 234 | 2022-05-19 | 20:44:10.91 | 3 | 2 | Tomasic, Connor | 1000113049 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220519-UNebraska-1' & pitches$PitcherTeam == 'MIC_SPA' & pitches$pitch_count > 100] <- 1000014400
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220519-UNebraska-1' & pitches$PitcherTeam == 'MIC_SPA' & pitches$pitch_count > 100] <- 'Bischoff, Kyle'
LAST ONE: Nick Dean only went five inning (87 pitches) Nigel Belgrave did 6 and 7. TOgether they threw 131 so theres ten missing or possibly correctly assigned to Belgrave 20220520-PurdueUniversity-1 https://umterps.com/sports/baseball/stats/2022/purdue/boxscore/12805
# Look at all pitches for this game
pitches[pitches$GameID == '20220520-PurdueUniversity-1' & pitches$PitcherTeam == 'MAR_TER', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 742792 | 742792 | 8 | 2022-05-20 | 17:07:18.79 | 1 | 1 | Dean, Nick | 1000079033 | |
| 742793 | 742793 | 9 | 2022-05-20 | 17:07:30.51 | 1 | 2 | Dean, Nick | 1000079033 | |
| 742794 | 742794 | 10 | 2022-05-20 | 17:07:42.35 | 1 | 3 | Dean, Nick | 1000079033 | |
| 742795 | 742795 | 11 | 2022-05-20 | 17:07:54.66 | 1 | 4 | Dean, Nick | 1000079033 | |
| 742796 | 742796 | 12 | 2022-05-20 | 17:08:29.49 | 2 | 1 | Dean, Nick | 1000079033 | |
| 742797 | 742797 | 13 | 2022-05-20 | 17:08:46.82 | 2 | 2 | Dean, Nick | 1000079033 | |
| 742798 | 742798 | 14 | 2022-05-20 | 17:08:59.35 | 2 | 3 | Dean, Nick | 1000079033 | |
| 742799 | 742799 | 15 | 2022-05-20 | 17:10:11.26 | 3 | 1 | Dean, Nick | 1000079033 | |
| 742800 | 742800 | 16 | 2022-05-20 | 17:10:47.07 | 3 | 2 | Dean, Nick | 1000079033 | |
| 742801 | 742801 | 17 | 2022-05-20 | 17:11:19.35 | 3 | 3 | Dean, Nick | 1000079033 |
#Nigel's id
pitches[pitches$Pitcher == 'Belgrave, Nigel', ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 13064 | 13064 | 310 | 2022-02-19 | 18:10:03.31 | 1 | 1 | Belgrave, Nigel | 1000100709 | |
| 13065 | 13065 | 311 | 2022-02-19 | 18:10:41.80 | 2 | 1 | Belgrave, Nigel | 1000100709 | |
| 13066 | 13066 | 312 | 2022-02-19 | 18:10:59.22 | 2 | 2 | Belgrave, Nigel | 1000100709 | |
| 13067 | 13067 | 313 | 2022-02-19 | 18:11:21.26 | 2 | 3 | Belgrave, Nigel | 1000100709 | |
| 13068 | 13068 | 314 | 2022-02-19 | 18:11:39.10 | 2 | 4 | Belgrave, Nigel | 1000100709 | |
| 13069 | 13069 | 315 | 2022-02-19 | 18:12:37.35 | 3 | 1 | Belgrave, Nigel | 1000100709 | |
| 13070 | 13070 | 316 | 2022-02-19 | 18:12:55.89 | 3 | 2 | Belgrave, Nigel | 1000100709 | |
| 13071 | 13071 | 317 | 2022-02-19 | 18:13:15.20 | 3 | 3 | Belgrave, Nigel | 1000100709 | |
| 13072 | 13072 | 318 | 2022-02-19 | 18:13:33.31 | 3 | 4 | Belgrave, Nigel | 1000100709 | |
| 13073 | 13073 | 319 | 2022-02-19 | 18:14:31.10 | 4 | 1 | Belgrave, Nigel | 1000100709 |
#1000100709
#Evrything 89 to 142 goes to NIgel
#Check entire rows first
pitches[pitches$GameID == '20220520-PurdueUniversity-1' & pitches$PitcherTeam == 'MAR_TER' & pitches$pitch_count > 88 & pitches$pitch_count < 143, ]
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 742971 | 742971 | 187 | 2022-05-20 | 19:01:16.15 | 1 | 1 | Dean, Nick | 1000079033 | |
| 742972 | 742972 | 188 | 2022-05-20 | 19:01:31.10 | 1 | 2 | Dean, Nick | 1000079033 | |
| 742973 | 742973 | 189 | 2022-05-20 | 19:01:45.12 | 1 | 3 | Dean, Nick | 1000079033 | |
| 742974 | 742974 | 190 | 2022-05-20 | 19:02:16.21 | 2 | 1 | Dean, Nick | 1000079033 | |
| 742975 | 742975 | 191 | 2022-05-20 | 19:02:32.00 | 2 | 2 | Dean, Nick | 1000079033 | |
| 742976 | 742976 | 192 | 2022-05-20 | 19:03:11.73 | 3 | 1 | Dean, Nick | 1000079033 | |
| 742977 | 742977 | 193 | 2022-05-20 | 19:03:32.74 | 3 | 2 | Dean, Nick | 1000079033 | |
| 742978 | 742978 | 194 | 2022-05-20 | 19:04:09.05 | 3 | 3 | Dean, Nick | 1000079033 | |
| 742979 | 742979 | 195 | 2022-05-20 | 19:04:38.86 | 3 | 4 | Dean, Nick | 1000079033 | |
| 742980 | 742980 | 196 | 2022-05-20 | 19:05:03.62 | 3 | 5 | Dean, Nick | 1000079033 |
#Now Replace
pitches$PitcherId[pitches$GameID == '20220520-PurdueUniversity-1' & pitches$PitcherTeam == 'MAR_TER' & pitches$pitch_count > 88 & pitches$pitch_count < 143] <- 1000100709
#Replace Name too Why not
pitches$Pitcher[pitches$GameID == '20220520-PurdueUniversity-1' & pitches$PitcherTeam == 'MAR_TER' & pitches$pitch_count > 88 & pitches$pitch_count < 143] <- 'Belgrave, Nigel'
Unfortunatley there are probably other instances where the trackman operator didnt record a pitching change and some pitches early in a reliever’s outing look like they came late in the starters. We caught the worst cases where it put pitchers above 120 pitches but there could be more. I don’t think it’s feasible to look through 1.2 million pitches and double check them all.
I’m going to runthe chunks that assign pitch counts again now that the pitcher IDs are accurate
Add a column that specifies how many pitches a pitcher has thrown
pitches$pitch_count <- with(pitches, ave(seq_along(paste(GameID, PitcherId)), paste(GameID, PitcherId), FUN = seq_along)) - 1
# Add a new factor column to the dataframe for the pitch group
pitches$pitch_group <- as.factor(ifelse(pitches$pitch_count < 100, (pitches$pitch_count) %/% 10 + 1, 11))
# Check the updated dataframe
head(pitches, 250)
X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 2022-02-18 | 13:32:19.86 | 1 | 1 | Kniskern, Trevor | 1000054486 | |
| 2 | 2 | 2 | 2022-02-18 | 13:32:36.00 | 1 | 2 | Kniskern, Trevor | 1000054486 | |
| 3 | 3 | 3 | 2022-02-18 | 13:33:12.45 | 1 | 3 | Kniskern, Trevor | 1000054486 | |
| 4 | 4 | 4 | 2022-02-18 | 13:33:53.17 | 2 | 1 | Kniskern, Trevor | 1000054486 | |
| 5 | 5 | 5 | 2022-02-18 | 13:34:10.28 | 2 | 2 | Kniskern, Trevor | 1000054486 | |
| 6 | 6 | 6 | 2022-02-18 | 13:34:29.80 | 2 | 3 | Kniskern, Trevor | 1000054486 | |
| 7 | 7 | 7 | 2022-02-18 | 13:34:50.36 | 2 | 4 | Kniskern, Trevor | 1000054486 | |
| 8 | 8 | 8 | 2022-02-18 | 13:35:24.42 | 2 | 5 | Kniskern, Trevor | 1000054486 | |
| 9 | 9 | 9 | 2022-02-18 | 13:36:11.69 | 3 | 1 | Kniskern, Trevor | 1000054486 | |
| 10 | 10 | 10 | 2022-02-18 | 13:36:36.90 | 3 | 2 | Kniskern, Trevor | 1000054486 |
I want better names for the pitch_group levels
pitches$pitch_bin <- pitches$pitch_group
pitches$pitch_group <- NA
pitches$pitch_group[pitches$pitch_bin == '1'] <- '0-9 Pitches'
pitches$pitch_group[pitches$pitch_bin == '2'] <- '10-19 Pitches'
pitches$pitch_group[pitches$pitch_bin == '3'] <- '20-29 Pitches'
pitches$pitch_group[pitches$pitch_bin == '4'] <- '30-39 Pitches'
pitches$pitch_group[pitches$pitch_bin == '5'] <- '40-49 Pitches'
pitches$pitch_group[pitches$pitch_bin == '6'] <- '50-59 Pitches'
pitches$pitch_group[pitches$pitch_bin == '7'] <- '60-69 Pitches'
pitches$pitch_group[pitches$pitch_bin == '8'] <- '70-79 Pitches'
pitches$pitch_group[pitches$pitch_bin == '9'] <- '80-89 Pitches'
pitches$pitch_group[pitches$pitch_bin == '10'] <- '90-99 Pitches'
pitches$pitch_group[pitches$pitch_bin == '11'] <- 'More Than 100 Pitches'
#MAke sure the order is correct. Really annoying if regression output isn't in ascending order
sqldf("SELECT pitch_group, count(*) from pitches GROUP BY pitch_group ORDER BY pitch_group")
pitch_group <chr> | count(*) <int> | |||
|---|---|---|---|---|
| 0-9 Pitches | 297887 | |||
| 10-19 Pitches | 240230 | |||
| 20-29 Pitches | 171000 | |||
| 30-39 Pitches | 122061 | |||
| 40-49 Pitches | 90347 | |||
| 50-59 Pitches | 69969 | |||
| 60-69 Pitches | 55980 | |||
| 70-79 Pitches | 42674 | |||
| 80-89 Pitches | 29738 | |||
| 90-99 Pitches | 16662 |
write.csv(pitches, 'C:\\Users\\Nick\\UCSB Baseball\\All_College_TM_19_22_Mistakes_Fixed_Pitchcounts_added.csv')
pitches <- read.csv('C:\\Users\\Nick\\UCSB Baseball\\All_College_TM_19_22_Mistakes_Fixed_Pitchcounts_added.csv')
Horizontal break is meaningless if theres no distinction made between lefties and righties
pitches$Hbrk <- NA
pitches$Hbrk[pitches$PitcherThrows == 'Right'] <- -pitches$HorzBreak
## Warning in pitches$Hbrk[pitches$PitcherThrows == "Right"] <- -pitches$HorzBreak:
## number of items to replace is not a multiple of replacement length
pitches$Hbrk[pitches$PitcherThrows == 'Left'] <- pitches$HorzBreak
## Warning in pitches$Hbrk[pitches$PitcherThrows == "Left"] <- pitches$HorzBreak:
## number of items to replace is not a multiple of replacement length
pitches %>% group_by(PitcherThrows) %>% summarise(n =n())
PitcherThrows <chr> | n <int> | |||
|---|---|---|---|---|
| Left | 315035 | |||
| Right | 829400 | |||
| RIght | 17 | |||
| Undefined | 3 |
##Year Averages used for differences
##Find averages for the dependent measures; Velovity, Spin Rate, Break, Still try to do command, result based metric
pitcher_year_group <- sqldf("select PitcherId, TaggedPitchType, SUBSTRING(Date, 1, 4) as YEAR, MAX(Pitcher) as PitcherName, MAX(PitcherTeam) as Team, AVG(RelSpeed) AS avg_RelSpeedYear, AVG(SpinRate) AS avg_spinrateYear, AVG(InducedVertBReak) AS avg_IndVertBrkYear, AVG(Hbrk) As avg_HBrkYear, COUNT(*) as PitchesThrown FROM pitches WHERE PitcherId is not null GROUP BY PitcherId, SUBSTRING(Date, 1, 4), TaggedPitchType ORDER BY COUNT(*) DESC")
Relevel Pitch_group. MAke 0-9 the baseline
pitches$pitch_group <- relevel(factor(pitches$pitch_group), ref = '0-9 Pitches')
Next we have to merge it so that summary stats for each pitch type of each pitcher are included for in each row, later we’ll also calculate differences between each individual pitch numbers and the averages
pitches$YEAR = NA
pitches$YEAR <- substr(pitches$Date, 0, 4)
pitcherYearAvgs <- left_join(pitches, pitcher_year_group, on = c('PitcherId', 'YEAR'))
## Joining, by = c("PitcherId", "TaggedPitchType", "YEAR")
head(pitcherYearAvgs, 50)
X.1 <int> | X <int> | PitchNo <int> | Date <chr> | Time <chr> | PAofInning <int> | PitchofPA <int> | Pitcher <chr> | PitcherId <dbl> | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 2022-02-18 | 13:32:19.86 | 1 | 1 | Kniskern, Trevor | 1000054486 | |
| 2 | 2 | 2 | 2 | 2022-02-18 | 13:32:36.00 | 1 | 2 | Kniskern, Trevor | 1000054486 | |
| 3 | 3 | 3 | 3 | 2022-02-18 | 13:33:12.45 | 1 | 3 | Kniskern, Trevor | 1000054486 | |
| 4 | 4 | 4 | 4 | 2022-02-18 | 13:33:53.17 | 2 | 1 | Kniskern, Trevor | 1000054486 | |
| 5 | 5 | 5 | 5 | 2022-02-18 | 13:34:10.28 | 2 | 2 | Kniskern, Trevor | 1000054486 | |
| 6 | 6 | 6 | 6 | 2022-02-18 | 13:34:29.80 | 2 | 3 | Kniskern, Trevor | 1000054486 | |
| 7 | 7 | 7 | 7 | 2022-02-18 | 13:34:50.36 | 2 | 4 | Kniskern, Trevor | 1000054486 | |
| 8 | 8 | 8 | 8 | 2022-02-18 | 13:35:24.42 | 2 | 5 | Kniskern, Trevor | 1000054486 | |
| 9 | 9 | 9 | 9 | 2022-02-18 | 13:36:11.69 | 3 | 1 | Kniskern, Trevor | 1000054486 | |
| 10 | 10 | 10 | 10 | 2022-02-18 | 13:36:36.90 | 3 | 2 | Kniskern, Trevor | 1000054486 |
pitcherYearAvgs$RelSpeedDiffYear <- pitcherYearAvgs$RelSpeed - pitcherYearAvgs$avg_RelSpeedYear
pitcherYearAvgs$VertBreakDiffYear <- pitcherYearAvgs$InducedVertBreak - pitcherYearAvgs$avg_IndVertBrkYear
pitcherYearAvgs$HorzBreakDiffYear <- pitcherYearAvgs$Hbrk - pitcherYearAvgs$avg_HBrkYear
pitcherYearAvgs$SpinRateDiffYear <- pitcherYearAvgs$SpinRate - pitcherYearAvgs$avg_spinrateYear
Year avgs would work, and might be better, for WHIP regressions and other results based tests.
HAve to make individual dfs for each pitch type
fastballsYear <- pitcherYearAvgs[pitcherYearAvgs$TaggedPitchType == 'Fastball', ]
changeupsYear <- pitcherYearAvgs[pitcherYearAvgs$TaggedPitchType == 'ChangeUp', ]
curveballsYear <- pitcherYearAvgs[pitcherYearAvgs$TaggedPitchType == 'Curveball', ]
cuttersYear <- pitcherYearAvgs[pitcherYearAvgs$TaggedPitchType == 'Cutter', ]
slidersYear <- pitcherYearAvgs[pitcherYearAvgs$TaggedPitchType == 'Slider', ]
sinkersYear <- pitcherYearAvgs[pitcherYearAvgs$TaggedPitchType == 'Sinker', ]
splittersYear <- pitcherYearAvgs[pitcherYearAvgs$TaggedPitchType == 'Splitter', ]
##Regressions
###Continuous
Fastballs
fastballSpinRateYear <- lm(SpinRateDiffYear ~ pitch_count, data = fastballsYear)
summary(fastballSpinRateYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_count, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1442.50 -56.15 4.81 62.88 1540.86
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.241863 0.192084 27.29 <2e-16 ***
## pitch_count -0.191874 0.005283 -36.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 100.9 on 634095 degrees of freedom
## (3401 observations deleted due to missingness)
## Multiple R-squared: 0.002076, Adjusted R-squared: 0.002074
## F-statistic: 1319 on 1 and 634095 DF, p-value: < 2.2e-16
fastballRelSpeedYear <- lm(RelSpeedDiffYear ~ pitch_count, data = fastballsYear)
summary(fastballRelSpeedYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_count, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.933 -0.768 0.072 0.881 14.643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3544947 0.0027376 129.5 <2e-16 ***
## pitch_count -0.0129755 0.0000753 -172.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.439 on 635019 degrees of freedom
## (2477 observations deleted due to missingness)
## Multiple R-squared: 0.04468, Adjusted R-squared: 0.04467
## F-statistic: 2.97e+04 on 1 and 635019 DF, p-value: < 2.2e-16
fastballVertBreakYear <- lm(VertBreakDiffYear ~ pitch_count, data = fastballsYear)
summary(fastballVertBreakYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_count, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.517 -1.766 0.073 1.869 33.974
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0438247 0.0058778 7.456 8.93e-14 ***
## pitch_count -0.0016038 0.0001616 -9.922 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.085 on 632796 degrees of freedom
## (4700 observations deleted due to missingness)
## Multiple R-squared: 0.0001555, Adjusted R-squared: 0.000154
## F-statistic: 98.44 on 1 and 632796 DF, p-value: < 2.2e-16
fastballHorzBreakYear <- lm(HorzBreakDiffYear ~ pitch_count, data = fastballsYear)
summary(fastballHorzBreakYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_count, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.538 -9.443 -0.854 9.534 37.277
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0256665 0.0214953 -1.194 0.232
## pitch_count 0.0009401 0.0005915 1.589 0.112
##
## Residual standard error: 11.25 on 629345 degrees of freedom
## (8151 observations deleted due to missingness)
## Multiple R-squared: 4.014e-06, Adjusted R-squared: 2.425e-06
## F-statistic: 2.526 on 1 and 629345 DF, p-value: 0.112
Change Up
changeupSpinRateYear <- lm(SpinRateDiffYear ~ pitch_count, data = changeupsYear)
summary(changeupSpinRateYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_count, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1150.81 -83.80 -6.27 70.90 2408.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.6678 0.7466 15.63 <2e-16 ***
## pitch_count -0.3628 0.0183 -19.82 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 161.2 on 123178 degrees of freedom
## (619 observations deleted due to missingness)
## Multiple R-squared: 0.003181, Adjusted R-squared: 0.003172
## F-statistic: 393 on 1 and 123178 DF, p-value: < 2.2e-16
changeupRelSpeedYear <- lm(RelSpeedDiffYear ~ pitch_count, data = changeupsYear)
summary(changeupRelSpeedYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_count, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.6986 -0.8695 0.0089 0.8904 12.9319
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2661535 0.0072789 36.56 <2e-16 ***
## pitch_count -0.0082802 0.0001785 -46.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.575 on 123591 degrees of freedom
## (206 observations deleted due to missingness)
## Multiple R-squared: 0.01712, Adjusted R-squared: 0.01711
## F-statistic: 2152 on 1 and 123591 DF, p-value: < 2.2e-16
changeupVertBreakYear <- lm(VertBreakDiffYear ~ pitch_count, data = changeupsYear)
summary(changeupVertBreakYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_count, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.704 -2.196 0.033 2.283 21.938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0979812 0.0179679 -5.453 4.96e-08 ***
## pitch_count 0.0030479 0.0004406 6.918 4.59e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.881 on 123219 degrees of freedom
## (578 observations deleted due to missingness)
## Multiple R-squared: 0.0003883, Adjusted R-squared: 0.0003802
## F-statistic: 47.86 on 1 and 123219 DF, p-value: 4.591e-12
changeupHorzBreakYear <- lm(HorzBreakDiffYear ~ pitch_count, data = changeupsYear)
summary(changeupHorzBreakYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_count, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.727 -9.009 -0.349 9.099 33.723
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.035870 0.051072 -0.702 0.482
## pitch_count 0.001116 0.001252 0.891 0.373
##
## Residual standard error: 10.99 on 122247 degrees of freedom
## (1550 observations deleted due to missingness)
## Multiple R-squared: 6.496e-06, Adjusted R-squared: -1.684e-06
## F-statistic: 0.7942 on 1 and 122247 DF, p-value: 0.3728
Curveballs
curveballSpinRateYear <- lm(SpinRateDiffYear ~ pitch_count, data = curveballsYear)
summary(curveballSpinRateYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_count, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1560.49 -58.13 6.70 70.89 1955.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.59784 0.71775 7.799 6.3e-15 ***
## pitch_count -0.18349 0.01817 -10.097 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 134.9 on 87522 degrees of freedom
## (1013 observations deleted due to missingness)
## Multiple R-squared: 0.001164, Adjusted R-squared: 0.001152
## F-statistic: 102 on 1 and 87522 DF, p-value: < 2.2e-16
curveballRelSpeedYear <- lm(RelSpeedDiffYear ~ pitch_count, data = curveballsYear)
summary(curveballRelSpeedYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_count, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.1420 -0.9747 -0.0423 0.9135 19.9656
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1894395 0.0087500 21.65 <2e-16 ***
## pitch_count -0.0062093 0.0002215 -28.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.651 on 88291 degrees of freedom
## (244 observations deleted due to missingness)
## Multiple R-squared: 0.008819, Adjusted R-squared: 0.008808
## F-statistic: 785.5 on 1 and 88291 DF, p-value: < 2.2e-16
curveballVertBreakYear <- lm(VertBreakDiffYear ~ pitch_count, data = curveballsYear)
summary(curveballVertBreakYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_count, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.273 -2.232 -0.176 1.876 39.805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0644487 0.0205406 -3.138 0.0017 **
## pitch_count 0.0021101 0.0005196 4.061 4.89e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.866 on 87887 degrees of freedom
## (648 observations deleted due to missingness)
## Multiple R-squared: 0.0001876, Adjusted R-squared: 0.0001762
## F-statistic: 16.49 on 1 and 87887 DF, p-value: 4.892e-05
curveballHorzBreakYear <- lm(HorzBreakDiffYear ~ pitch_count, data = curveballsYear)
summary(curveballHorzBreakYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_count, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.637 -8.962 -0.412 8.981 34.137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0052652 0.0582718 -0.090 0.928
## pitch_count 0.0001725 0.0014749 0.117 0.907
##
## Residual standard error: 10.94 on 87453 degrees of freedom
## (1082 observations deleted due to missingness)
## Multiple R-squared: 1.564e-07, Adjusted R-squared: -1.128e-05
## F-statistic: 0.01368 on 1 and 87453 DF, p-value: 0.9069
CUtters
cutterSpinRateYear <- lm(SpinRateDiffYear ~ pitch_count, data = cuttersYear)
summary(cutterSpinRateYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_count, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1278.40 -53.47 3.33 61.35 1368.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.14612 1.55200 5.249 1.55e-07 ***
## pitch_count -0.27648 0.04013 -6.890 5.85e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 115.9 on 13288 degrees of freedom
## (110 observations deleted due to missingness)
## Multiple R-squared: 0.003559, Adjusted R-squared: 0.003484
## F-statistic: 47.47 on 1 and 13288 DF, p-value: 5.848e-12
cutterlRelSpeedYear <- lm(RelSpeedDiffYear ~ pitch_count, data = cuttersYear)
summary(cutterlRelSpeedYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_count, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.2257 -0.8456 0.0077 0.8685 11.4163
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2166739 0.0202787 10.69 <2e-16 ***
## pitch_count -0.0073515 0.0005242 -14.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.519 on 13371 degrees of freedom
## (27 observations deleted due to missingness)
## Multiple R-squared: 0.0145, Adjusted R-squared: 0.01442
## F-statistic: 196.7 on 1 and 13371 DF, p-value: < 2.2e-16
cutterVertBreakYear <- lm(VertBreakDiffYear ~ pitch_count, data = cuttersYear)
summary(cutterVertBreakYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_count, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.7201 -1.8558 -0.0654 1.7084 22.0430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.083845 0.042984 -1.951 0.0511 .
## pitch_count 0.002844 0.001111 2.560 0.0105 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.217 on 13352 degrees of freedom
## (46 observations deleted due to missingness)
## Multiple R-squared: 0.0004906, Adjusted R-squared: 0.0004158
## F-statistic: 6.554 on 1 and 13352 DF, p-value: 0.01047
cutterHorzBreakYear <- lm(HorzBreakDiffYear ~ pitch_count, data = cuttersYear)
summary(cutterHorzBreakYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_count, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.495 -8.992 -1.154 9.133 33.669
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.250328 0.147726 1.695 0.0902 .
## pitch_count -0.008522 0.003829 -2.225 0.0261 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.02 on 13242 degrees of freedom
## (156 observations deleted due to missingness)
## Multiple R-squared: 0.0003738, Adjusted R-squared: 0.0002983
## F-statistic: 4.952 on 1 and 13242 DF, p-value: 0.02608
Sliders
sliderSpinRateYear <- lm(SpinRateDiffYear ~ pitch_count, data = slidersYear)
summary(sliderSpinRateYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1916.90 -58.25 10.43 77.45 2294.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.30553 0.51832 12.16 <2e-16 ***
## pitch_count -0.21943 0.01377 -15.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 153.2 on 209264 degrees of freedom
## (4771 observations deleted due to missingness)
## Multiple R-squared: 0.001212, Adjusted R-squared: 0.001208
## F-statistic: 254 on 1 and 209264 DF, p-value: < 2.2e-16
sliderlRelSpeedYear <- lm(RelSpeedDiffYear ~ pitch_count, data = slidersYear)
summary(sliderlRelSpeedYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.5727 -0.9945 -0.0015 0.9936 16.4665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2058581 0.0058695 35.07 <2e-16 ***
## pitch_count -0.0071664 0.0001559 -45.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.753 on 213530 degrees of freedom
## (505 observations deleted due to missingness)
## Multiple R-squared: 0.009794, Adjusted R-squared: 0.009789
## F-statistic: 2112 on 1 and 213530 DF, p-value: < 2.2e-16
sliderVertBreakYear <- lm(VertBreakDiffYear ~ pitch_count, data = slidersYear)
summary(sliderVertBreakYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.222 -2.179 -0.039 2.082 56.100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0178076 0.0129526 -1.375 0.1692
## pitch_count 0.0006200 0.0003442 1.801 0.0716 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.86 on 212707 degrees of freedom
## (1328 observations deleted due to missingness)
## Multiple R-squared: 1.526e-05, Adjusted R-squared: 1.056e-05
## F-statistic: 3.245 on 1 and 212707 DF, p-value: 0.07163
sliderHorzBreakYear <- lm(HorzBreakDiffYear ~ pitch_count, data = slidersYear)
summary(sliderHorzBreakYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.524 -9.244 -0.921 9.311 38.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.027955 0.037466 -0.746 0.456
## pitch_count 0.000975 0.000997 0.978 0.328
##
## Residual standard error: 11.13 on 211266 degrees of freedom
## (2769 observations deleted due to missingness)
## Multiple R-squared: 4.527e-06, Adjusted R-squared: -2.068e-07
## F-statistic: 0.9563 on 1 and 211266 DF, p-value: 0.3281
Sinkers
sliderSpinRateYear <- lm(SpinRateDiffYear ~ pitch_count, data = slidersYear)
summary(sliderSpinRateYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1916.90 -58.25 10.43 77.45 2294.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.30553 0.51832 12.16 <2e-16 ***
## pitch_count -0.21943 0.01377 -15.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 153.2 on 209264 degrees of freedom
## (4771 observations deleted due to missingness)
## Multiple R-squared: 0.001212, Adjusted R-squared: 0.001208
## F-statistic: 254 on 1 and 209264 DF, p-value: < 2.2e-16
sliderlRelSpeedYear <- lm(RelSpeedDiffYear ~ pitch_count, data = slidersYear)
summary(sliderlRelSpeedYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.5727 -0.9945 -0.0015 0.9936 16.4665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2058581 0.0058695 35.07 <2e-16 ***
## pitch_count -0.0071664 0.0001559 -45.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.753 on 213530 degrees of freedom
## (505 observations deleted due to missingness)
## Multiple R-squared: 0.009794, Adjusted R-squared: 0.009789
## F-statistic: 2112 on 1 and 213530 DF, p-value: < 2.2e-16
sliderVertBreakYear <- lm(VertBreakDiffYear ~ pitch_count, data = slidersYear)
summary(sliderVertBreakYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.222 -2.179 -0.039 2.082 56.100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0178076 0.0129526 -1.375 0.1692
## pitch_count 0.0006200 0.0003442 1.801 0.0716 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.86 on 212707 degrees of freedom
## (1328 observations deleted due to missingness)
## Multiple R-squared: 1.526e-05, Adjusted R-squared: 1.056e-05
## F-statistic: 3.245 on 1 and 212707 DF, p-value: 0.07163
sliderHorzBreakYear <- lm(HorzBreakDiffYear ~ pitch_count, data = slidersYear)
summary(sliderHorzBreakYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_count, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.524 -9.244 -0.921 9.311 38.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.027955 0.037466 -0.746 0.456
## pitch_count 0.000975 0.000997 0.978 0.328
##
## Residual standard error: 11.13 on 211266 degrees of freedom
## (2769 observations deleted due to missingness)
## Multiple R-squared: 4.527e-06, Adjusted R-squared: -2.068e-07
## F-statistic: 0.9563 on 1 and 211266 DF, p-value: 0.3281
##Factor Level (Pitch_groups)
Fastballs
fastballSpinRateBinYear <- lm(SpinRateDiffYear ~ pitch_group, data = fastballsYear)
summary(fastballSpinRateBinYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_group, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1445.55 -56.12 4.81 62.86 1542.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.7206 0.2404 11.317 < 2e-16 ***
## pitch_group10-19 Pitches 1.9217 0.3666 5.242 1.59e-07 ***
## pitch_group20-29 Pitches -0.3470 0.4054 -0.856 0.392
## pitch_group30-39 Pitches -3.9807 0.4592 -8.669 < 2e-16 ***
## pitch_group40-49 Pitches -6.1259 0.5215 -11.747 < 2e-16 ***
## pitch_group50-59 Pitches -8.0382 0.5804 -13.850 < 2e-16 ***
## pitch_group60-69 Pitches -12.3295 0.6343 -19.438 < 2e-16 ***
## pitch_group70-79 Pitches -15.3186 0.7194 -21.295 < 2e-16 ***
## pitch_group80-89 Pitches -15.7449 0.8620 -18.266 < 2e-16 ***
## pitch_group90-99 Pitches -13.2271 1.1265 -11.742 < 2e-16 ***
## pitch_groupMore Than 100 Pitches -12.4073 1.6155 -7.680 1.59e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 100.9 on 634086 degrees of freedom
## (3401 observations deleted due to missingness)
## Multiple R-squared: 0.002544, Adjusted R-squared: 0.002528
## F-statistic: 161.7 on 10 and 634086 DF, p-value: < 2.2e-16
#fastballSR <- as.data.frame(fastballSpinRateBinYear$coefficients)
fastballRelSpeedBinYear <- lm(RelSpeedDiffYear ~ pitch_group, data = fastballsYear)
summary(fastballRelSpeedBinYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_group, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.843 -0.766 0.074 0.880 14.762
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.361035 0.003423 105.46 <2e-16 ***
## pitch_group10-19 Pitches -0.193071 0.005220 -36.98 <2e-16 ***
## pitch_group20-29 Pitches -0.382668 0.005773 -66.28 <2e-16 ***
## pitch_group30-39 Pitches -0.527655 0.006540 -80.69 <2e-16 ***
## pitch_group40-49 Pitches -0.641584 0.007426 -86.40 <2e-16 ***
## pitch_group50-59 Pitches -0.732228 0.008264 -88.60 <2e-16 ***
## pitch_group60-69 Pitches -0.884686 0.009033 -97.94 <2e-16 ***
## pitch_group70-79 Pitches -0.955556 0.010243 -93.29 <2e-16 ***
## pitch_group80-89 Pitches -0.989537 0.012275 -80.62 <2e-16 ***
## pitch_group90-99 Pitches -1.021188 0.016038 -63.67 <2e-16 ***
## pitch_groupMore Than 100 Pitches -0.953133 0.022994 -41.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.438 on 635010 degrees of freedom
## (2477 observations deleted due to missingness)
## Multiple R-squared: 0.04671, Adjusted R-squared: 0.04669
## F-statistic: 3111 on 10 and 635010 DF, p-value: < 2.2e-16
#as.data.frame(fastballRelSpeedBinYear$coefficients)
fastballVertBreakBinYear <- lm(VertBreakDiffYear ~ pitch_group, data = fastballsYear)
summary(fastballVertBreakBinYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_group, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.507 -1.766 0.073 1.870 33.978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.030503 0.007358 4.146 3.39e-05 ***
## pitch_group10-19 Pitches -0.012767 0.011221 -1.138 0.255216
## pitch_group20-29 Pitches -0.008705 0.012408 -0.702 0.482983
## pitch_group30-39 Pitches -0.052571 0.014054 -3.741 0.000184 ***
## pitch_group40-49 Pitches -0.054728 0.015958 -3.429 0.000605 ***
## pitch_group50-59 Pitches -0.047091 0.017759 -2.652 0.008009 **
## pitch_group60-69 Pitches -0.070036 0.019409 -3.608 0.000308 ***
## pitch_group70-79 Pitches -0.114292 0.022010 -5.193 2.07e-07 ***
## pitch_group80-89 Pitches -0.118241 0.026363 -4.485 7.29e-06 ***
## pitch_group90-99 Pitches -0.230369 0.034469 -6.683 2.34e-11 ***
## pitch_groupMore Than 100 Pitches -0.174715 0.049467 -3.532 0.000413 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.085 on 632787 degrees of freedom
## (4700 observations deleted due to missingness)
## Multiple R-squared: 0.0001751, Adjusted R-squared: 0.0001593
## F-statistic: 11.08 on 10 and 632787 DF, p-value: < 2.2e-16
#fastballV <- as.data.frame(fastballVertBreakBinYear$coefficients)
fastballHorzBreakBinYear <- lm(HorzBreakDiffYear ~ pitch_group, data = fastballsYear)
summary(fastballHorzBreakBinYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_group, data = fastballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.509 -9.444 -0.856 9.532 37.276
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.04607 0.02690 -1.713 0.0867 .
## pitch_group10-19 Pitches 0.03736 0.04104 0.910 0.3626
## pitch_group20-29 Pitches 0.08406 0.04539 1.852 0.0640 .
## pitch_group30-39 Pitches 0.05912 0.05142 1.150 0.2502
## pitch_group40-49 Pitches 0.06343 0.05837 1.087 0.2772
## pitch_group50-59 Pitches 0.13665 0.06498 2.103 0.0355 *
## pitch_group60-69 Pitches 0.14970 0.07103 2.107 0.0351 *
## pitch_group70-79 Pitches -0.08579 0.08058 -1.065 0.2870
## pitch_group80-89 Pitches 0.06636 0.09653 0.687 0.4918
## pitch_group90-99 Pitches 0.04705 0.12637 0.372 0.7096
## pitch_groupMore Than 100 Pitches 0.09878 0.18080 0.546 0.5848
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.25 on 629336 degrees of freedom
## (8151 observations deleted due to missingness)
## Multiple R-squared: 1.95e-05, Adjusted R-squared: 3.609e-06
## F-statistic: 1.227 on 10 and 629336 DF, p-value: 0.2673
#fastballH <- as.data.frame(fastballHorzBreakBinYear$coefficients)
changeupSpinRateBinYear <- lm(SpinRateDiffYear ~ pitch_group, data = changeupsYear)
summary(changeupSpinRateBinYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_group, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1151.00 -83.82 -6.54 70.98 2409.04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.962 1.011 8.867 < 2e-16 ***
## pitch_group10-19 Pitches -1.804 1.442 -1.251 0.21078
## pitch_group20-29 Pitches -5.467 1.578 -3.465 0.00053 ***
## pitch_group30-39 Pitches -8.598 1.694 -5.075 3.89e-07 ***
## pitch_group40-49 Pitches -14.517 1.791 -8.105 5.31e-16 ***
## pitch_group50-59 Pitches -17.987 1.961 -9.173 < 2e-16 ***
## pitch_group60-69 Pitches -18.682 2.143 -8.719 < 2e-16 ***
## pitch_group70-79 Pitches -27.402 2.354 -11.640 < 2e-16 ***
## pitch_group80-89 Pitches -27.361 2.707 -10.107 < 2e-16 ***
## pitch_group90-99 Pitches -29.918 3.585 -8.346 < 2e-16 ***
## pitch_groupMore Than 100 Pitches -33.496 5.335 -6.278 3.44e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 161.2 on 123169 degrees of freedom
## (619 observations deleted due to missingness)
## Multiple R-squared: 0.003179, Adjusted R-squared: 0.003098
## F-statistic: 39.29 on 10 and 123169 DF, p-value: < 2.2e-16
#as.data.frame(changeupSpinRateBinYear$coefficients)
changeupRelSpeedBinYear <- lm(RelSpeedDiffYear ~ pitch_group, data = changeupsYear)
summary(changeupRelSpeedBinYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_group, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.7409 -0.8689 0.0071 0.8889 12.8830
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.170935 0.009849 17.356 < 2e-16 ***
## pitch_group10-19 Pitches -0.011529 0.014049 -0.821 0.411855
## pitch_group20-29 Pitches -0.052886 0.015376 -3.440 0.000583 ***
## pitch_group30-39 Pitches -0.129105 0.016519 -7.815 5.52e-15 ***
## pitch_group40-49 Pitches -0.254857 0.017464 -14.594 < 2e-16 ***
## pitch_group50-59 Pitches -0.339838 0.019126 -17.769 < 2e-16 ***
## pitch_group60-69 Pitches -0.494720 0.020887 -23.686 < 2e-16 ***
## pitch_group70-79 Pitches -0.577324 0.022959 -25.146 < 2e-16 ***
## pitch_group80-89 Pitches -0.645018 0.026410 -24.423 < 2e-16 ***
## pitch_group90-99 Pitches -0.733432 0.034956 -20.981 < 2e-16 ***
## pitch_groupMore Than 100 Pitches -0.729123 0.051964 -14.031 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.574 on 123582 degrees of freedom
## (206 observations deleted due to missingness)
## Multiple R-squared: 0.01806, Adjusted R-squared: 0.01798
## F-statistic: 227.3 on 10 and 123582 DF, p-value: < 2.2e-16
#as.data.frame(changeupRelSpeedBinYear$coefficients)
changeupVertBreakBinYear <- lm(VertBreakDiffYear ~ pitch_group, data = changeupsYear)
summary(changeupVertBreakBinYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_group, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.715 -2.195 0.030 2.282 21.855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.10182 0.02432 -4.186 2.84e-05 ***
## pitch_group10-19 Pitches 0.02999 0.03469 0.865 0.387293
## pitch_group20-29 Pitches 0.09407 0.03797 2.478 0.013227 *
## pitch_group30-39 Pitches 0.14168 0.04080 3.473 0.000515 ***
## pitch_group40-49 Pitches 0.14364 0.04311 3.332 0.000863 ***
## pitch_group50-59 Pitches 0.26702 0.04725 5.651 1.60e-08 ***
## pitch_group60-69 Pitches 0.11880 0.05157 2.304 0.021247 *
## pitch_group70-79 Pitches 0.30002 0.05667 5.294 1.20e-07 ***
## pitch_group80-89 Pitches 0.23491 0.06518 3.604 0.000314 ***
## pitch_group90-99 Pitches 0.21402 0.08629 2.480 0.013128 *
## pitch_groupMore Than 100 Pitches -0.03948 0.12851 -0.307 0.758714
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.881 on 123210 degrees of freedom
## (578 observations deleted due to missingness)
## Multiple R-squared: 0.0005564, Adjusted R-squared: 0.0004752
## F-statistic: 6.859 on 10 and 123210 DF, p-value: 8.367e-11
#as.data.frame(changeupVertBreakBinYear$coefficients)
changeupHorzBreakBinYear <- lm(HorzBreakDiffYear ~ pitch_group, data = changeupsYear)
summary(changeupHorzBreakBinYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_group, data = changeupsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.890 -9.010 -0.349 9.083 33.684
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01578 0.06912 -0.228 0.819
## pitch_group10-19 Pitches -0.07335 0.09861 -0.744 0.457
## pitch_group20-29 Pitches 0.04296 0.10792 0.398 0.691
## pitch_group30-39 Pitches 0.17918 0.11597 1.545 0.122
## pitch_group40-49 Pitches 0.08605 0.12254 0.702 0.483
## pitch_group50-59 Pitches -0.14951 0.13440 -1.112 0.266
## pitch_group60-69 Pitches -0.05242 0.14670 -0.357 0.721
## pitch_group70-79 Pitches -0.08538 0.16098 -0.530 0.596
## pitch_group80-89 Pitches 0.14784 0.18525 0.798 0.425
## pitch_group90-99 Pitches 0.22619 0.24561 0.921 0.357
## pitch_groupMore Than 100 Pitches 0.62245 0.36604 1.700 0.089 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.99 on 122238 degrees of freedom
## (1550 observations deleted due to missingness)
## Multiple R-squared: 0.0001013, Adjusted R-squared: 1.947e-05
## F-statistic: 1.238 on 10 and 122238 DF, p-value: 0.2604
#as.data.frame(changeupHorzBreakBinYear$coefficients)
Curveballs
curveballSpinRateBinYear <- lm(SpinRateDiffYear ~ pitch_group, data = curveballsYear)
summary(curveballSpinRateBinYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_group, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1562.89 -57.98 6.74 70.82 1951.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.13908 0.95270 1.196 0.231840
## pitch_group10-19 Pitches 5.02238 1.37771 3.645 0.000267 ***
## pitch_group20-29 Pitches 1.92152 1.51975 1.264 0.206102
## pitch_group30-39 Pitches 0.01574 1.67678 0.009 0.992510
## pitch_group40-49 Pitches -2.34638 1.83787 -1.277 0.201718
## pitch_group50-59 Pitches -8.46786 1.99723 -4.240 2.24e-05 ***
## pitch_group60-69 Pitches -9.88622 2.19267 -4.509 6.53e-06 ***
## pitch_group70-79 Pitches -10.46660 2.41838 -4.328 1.51e-05 ***
## pitch_group80-89 Pitches -15.27392 2.76171 -5.531 3.20e-08 ***
## pitch_group90-99 Pitches -9.76875 3.57655 -2.731 0.006309 **
## pitch_groupMore Than 100 Pitches -11.42645 4.93958 -2.313 0.020712 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 134.8 on 87513 degrees of freedom
## (1013 observations deleted due to missingness)
## Multiple R-squared: 0.00163, Adjusted R-squared: 0.001516
## F-statistic: 14.29 on 10 and 87513 DF, p-value: < 2.2e-16
#as.data.frame(curveballSpinRateBinYear$coefficients)
curveballRelSpeedBinYear <- lm(RelSpeedDiffYear ~ pitch_group, data = curveballsYear)
summary(curveballRelSpeedBinYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_group, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.0839 -0.9741 -0.0455 0.9101 20.0234
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.113009 0.011614 9.731 < 2e-16 ***
## pitch_group10-19 Pitches 0.004884 0.016792 0.291 0.771
## pitch_group20-29 Pitches -0.008058 0.018520 -0.435 0.663
## pitch_group30-39 Pitches -0.116465 0.020429 -5.701 1.19e-08 ***
## pitch_group40-49 Pitches -0.155680 0.022403 -6.949 3.70e-12 ***
## pitch_group50-59 Pitches -0.284222 0.024328 -11.683 < 2e-16 ***
## pitch_group60-69 Pitches -0.380180 0.026727 -14.225 < 2e-16 ***
## pitch_group70-79 Pitches -0.441171 0.029470 -14.970 < 2e-16 ***
## pitch_group80-89 Pitches -0.528695 0.033677 -15.699 < 2e-16 ***
## pitch_group90-99 Pitches -0.496301 0.043644 -11.372 < 2e-16 ***
## pitch_groupMore Than 100 Pitches -0.459348 0.060114 -7.641 2.17e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.65 on 88282 degrees of freedom
## (244 observations deleted due to missingness)
## Multiple R-squared: 0.009777, Adjusted R-squared: 0.009665
## F-statistic: 87.17 on 10 and 88282 DF, p-value: < 2.2e-16
#as.data.frame(curveballRelSpeedBinYear$coefficients)
curveballVertBreakBinYear <- lm(VertBreakDiffYear ~ pitch_group, data = curveballsYear)
summary(curveballVertBreakBinYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_group, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.314 -2.232 -0.176 1.873 39.778
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.11697 0.02729 -4.286 1.82e-05 ***
## pitch_group10-19 Pitches 0.11213 0.03944 2.843 0.004470 **
## pitch_group20-29 Pitches 0.14229 0.04350 3.271 0.001072 **
## pitch_group30-39 Pitches 0.16648 0.04798 3.470 0.000521 ***
## pitch_group40-49 Pitches 0.18215 0.05258 3.464 0.000532 ***
## pitch_group50-59 Pitches 0.16197 0.05710 2.837 0.004558 **
## pitch_group60-69 Pitches 0.19078 0.06274 3.041 0.002360 **
## pitch_group70-79 Pitches 0.18386 0.06910 2.661 0.007801 **
## pitch_group80-89 Pitches 0.18032 0.07900 2.282 0.022464 *
## pitch_group90-99 Pitches 0.22444 0.10231 2.194 0.028260 *
## pitch_groupMore Than 100 Pitches 0.05407 0.14082 0.384 0.701010
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.866 on 87878 degrees of freedom
## (648 observations deleted due to missingness)
## Multiple R-squared: 0.0003229, Adjusted R-squared: 0.0002091
## F-statistic: 2.838 on 10 and 87878 DF, p-value: 0.00157
#as.data.frame(curveballVertBreakBinYear$coefficients)
curveballHorzBreakBinYear <- lm(HorzBreakDiffYear ~ pitch_group, data = curveballsYear)
summary(curveballHorzBreakBinYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_group, data = curveballsYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.697 -8.961 -0.419 8.985 34.121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.001758 0.077378 0.023 0.982
## pitch_group10-19 Pitches -0.043857 0.111885 -0.392 0.695
## pitch_group20-29 Pitches 0.056363 0.123429 0.457 0.648
## pitch_group30-39 Pitches 0.015103 0.136132 0.111 0.912
## pitch_group40-49 Pitches -0.118843 0.149226 -0.796 0.426
## pitch_group50-59 Pitches 0.123086 0.161908 0.760 0.447
## pitch_group60-69 Pitches -0.175070 0.178197 -0.982 0.326
## pitch_group70-79 Pitches 0.072103 0.196116 0.368 0.713
## pitch_group80-89 Pitches 0.145017 0.224120 0.647 0.518
## pitch_group90-99 Pitches 0.049306 0.291437 0.169 0.866
## pitch_groupMore Than 100 Pitches 0.002830 0.399660 0.007 0.994
##
## Residual standard error: 10.94 on 87444 degrees of freedom
## (1082 observations deleted due to missingness)
## Multiple R-squared: 4.718e-05, Adjusted R-squared: -6.717e-05
## F-statistic: 0.4126 on 10 and 87444 DF, p-value: 0.9415
#as.data.frame(curveballHorzBreakBinYear$coefficients)
Cutters
cutterSpinRateBinYear <- lm(SpinRateDiffYear ~ pitch_group, data = cuttersYear)
summary(cutterSpinRateBinYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_group, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1276.82 -53.80 3.38 61.36 1365.15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.7401 2.0225 2.838 0.004544 **
## pitch_group10-19 Pitches -2.4351 2.9862 -0.815 0.414828
## pitch_group20-29 Pitches 0.1838 3.2922 0.056 0.955469
## pitch_group30-39 Pitches -3.7480 3.6487 -1.027 0.304344
## pitch_group40-49 Pitches -10.3194 4.1234 -2.503 0.012339 *
## pitch_group50-59 Pitches -20.2399 4.4892 -4.509 6.58e-06 ***
## pitch_group60-69 Pitches -17.8565 5.0607 -3.528 0.000419 ***
## pitch_group70-79 Pitches -22.8073 5.4923 -4.153 3.31e-05 ***
## pitch_group80-89 Pitches -12.5231 6.2138 -2.015 0.043885 *
## pitch_group90-99 Pitches -26.5169 7.5197 -3.526 0.000423 ***
## pitch_groupMore Than 100 Pitches -18.6500 10.8137 -1.725 0.084610 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 115.9 on 13279 degrees of freedom
## (110 observations deleted due to missingness)
## Multiple R-squared: 0.004475, Adjusted R-squared: 0.003726
## F-statistic: 5.969 on 10 and 13279 DF, p-value: 4.355e-09
#as.data.frame(cutterSpinRateBinYear$coefficient)
cutterRelSpeedBinYear <- lm(RelSpeedDiffYear ~ pitch_group, data = cuttersYear)
summary(cutterRelSpeedBinYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_group, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.2597 -0.8471 0.0108 0.8661 11.4139
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.20652 0.02642 7.818 5.79e-15 ***
## pitch_group10-19 Pitches -0.10157 0.03901 -2.603 0.00924 **
## pitch_group20-29 Pitches -0.17158 0.04301 -3.989 6.66e-05 ***
## pitch_group30-39 Pitches -0.26536 0.04770 -5.563 2.70e-08 ***
## pitch_group40-49 Pitches -0.33290 0.05386 -6.181 6.56e-10 ***
## pitch_group50-59 Pitches -0.41626 0.05865 -7.097 1.34e-12 ***
## pitch_group60-69 Pitches -0.39435 0.06611 -5.965 2.51e-09 ***
## pitch_group70-79 Pitches -0.60866 0.07165 -8.495 < 2e-16 ***
## pitch_group80-89 Pitches -0.68727 0.08113 -8.472 < 2e-16 ***
## pitch_group90-99 Pitches -0.67949 0.09799 -6.934 4.28e-12 ***
## pitch_groupMore Than 100 Pitches -0.31865 0.14170 -2.249 0.02455 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.519 on 13362 degrees of freedom
## (27 observations deleted due to missingness)
## Multiple R-squared: 0.01563, Adjusted R-squared: 0.01489
## F-statistic: 21.21 on 10 and 13362 DF, p-value: < 2.2e-16
#as.data.frame(cutterRelSpeedBinYear$coefficient)
cutterVertBreakBinYear <- lm(VertBreakDiffYear ~ pitch_group, data = cuttersYear)
summary(cutterVertBreakBinYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_group, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.7816 -1.8442 -0.0694 1.7192 21.9358
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03930 0.05602 -0.702 0.4830
## pitch_group10-19 Pitches -0.01649 0.08270 -0.199 0.8420
## pitch_group20-29 Pitches -0.07245 0.09122 -0.794 0.4271
## pitch_group30-39 Pitches 0.02628 0.10113 0.260 0.7950
## pitch_group40-49 Pitches 0.19637 0.11416 1.720 0.0854 .
## pitch_group50-59 Pitches 0.23781 0.12427 1.914 0.0557 .
## pitch_group60-69 Pitches 0.20182 0.14006 1.441 0.1496
## pitch_group70-79 Pitches -0.03363 0.15179 -0.222 0.8246
## pitch_group80-89 Pitches 0.01865 0.17187 0.109 0.9136
## pitch_group90-99 Pitches 0.50875 0.20796 2.446 0.0144 *
## pitch_groupMore Than 100 Pitches 0.20207 0.30017 0.673 0.5008
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.217 on 13343 degrees of freedom
## (46 observations deleted due to missingness)
## Multiple R-squared: 0.001284, Adjusted R-squared: 0.0005351
## F-statistic: 1.715 on 10 and 13343 DF, p-value: 0.07127
#as.data.frame(cutterVertBreakBinYear$coefficient)
cutterHorzBreakBinYear <- lm(HorzBreakDiffYear ~ pitch_group, data = cuttersYear)
summary(cutterHorzBreakBinYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_group, data = cuttersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.727 -8.952 -1.156 9.130 33.587
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.11248 0.19224 0.585 0.55849
## pitch_group10-19 Pitches -0.15201 0.28406 -0.535 0.59255
## pitch_group20-29 Pitches 0.01521 0.31332 0.049 0.96129
## pitch_group30-39 Pitches 0.22605 0.34802 0.650 0.51601
## pitch_group40-49 Pitches -0.03972 0.39237 -0.101 0.91937
## pitch_group50-59 Pitches -0.04007 0.42810 -0.094 0.92542
## pitch_group60-69 Pitches -0.46202 0.48122 -0.960 0.33702
## pitch_group70-79 Pitches -0.13910 0.52492 -0.265 0.79102
## pitch_group80-89 Pitches -1.56235 0.59431 -2.629 0.00858 **
## pitch_group90-99 Pitches -0.81362 0.72166 -1.127 0.25958
## pitch_groupMore Than 100 Pitches -1.57748 1.03253 -1.528 0.12659
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.02 on 13233 degrees of freedom
## (156 observations deleted due to missingness)
## Multiple R-squared: 0.0009412, Adjusted R-squared: 0.0001862
## F-statistic: 1.247 on 10 and 13233 DF, p-value: 0.2552
#as.data.frame(cutterHorzBreakBinYear$coefficient)
Sliders
sliderSpinRateBinYear <- lm(SpinRateDiffYear ~ pitch_group, data = slidersYear)
summary(sliderSpinRateBinYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_group, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1919.61 -58.25 10.39 77.50 2297.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.1797 0.6708 4.740 2.14e-06 ***
## pitch_group10-19 Pitches 2.3261 0.9830 2.366 0.0180 *
## pitch_group20-29 Pitches 0.9530 1.0943 0.871 0.3838
## pitch_group30-39 Pitches -5.2996 1.2216 -4.338 1.44e-05 ***
## pitch_group40-49 Pitches -8.2343 1.3638 -6.038 1.56e-09 ***
## pitch_group50-59 Pitches -10.7516 1.5018 -7.159 8.15e-13 ***
## pitch_group60-69 Pitches -12.8987 1.6581 -7.779 7.34e-15 ***
## pitch_group70-79 Pitches -15.9171 1.8620 -8.548 < 2e-16 ***
## pitch_group80-89 Pitches -13.7920 2.1527 -6.407 1.49e-10 ***
## pitch_group90-99 Pitches -19.6681 2.8346 -6.939 3.97e-12 ***
## pitch_groupMore Than 100 Pitches -7.6781 4.0257 -1.907 0.0565 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 153.2 on 209255 degrees of freedom
## (4771 observations deleted due to missingness)
## Multiple R-squared: 0.001513, Adjusted R-squared: 0.001465
## F-statistic: 31.71 on 10 and 209255 DF, p-value: < 2.2e-16
#as.data.frame(sliderSpinRateBinYear$coefficient)
sliderRelSpeedBinYear <- lm(RelSpeedDiffYear ~ pitch_group, data = slidersYear)
summary(sliderRelSpeedBinYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_group, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.5091 -0.9942 -0.0013 0.9925 16.4960
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.140549 0.007593 18.509 < 2e-16 ***
## pitch_group10-19 Pitches -0.007899 0.011129 -0.710 0.478
## pitch_group20-29 Pitches -0.083878 0.012392 -6.769 1.3e-11 ***
## pitch_group30-39 Pitches -0.154730 0.013829 -11.189 < 2e-16 ***
## pitch_group40-49 Pitches -0.245708 0.015445 -15.908 < 2e-16 ***
## pitch_group50-59 Pitches -0.302369 0.017004 -17.782 < 2e-16 ***
## pitch_group60-69 Pitches -0.471239 0.018803 -25.062 < 2e-16 ***
## pitch_group70-79 Pitches -0.517617 0.021109 -24.521 < 2e-16 ***
## pitch_group80-89 Pitches -0.621009 0.024365 -25.487 < 2e-16 ***
## pitch_group90-99 Pitches -0.605815 0.032147 -18.845 < 2e-16 ***
## pitch_groupMore Than 100 Pitches -0.506722 0.045360 -11.171 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.752 on 213521 degrees of freedom
## (505 observations deleted due to missingness)
## Multiple R-squared: 0.01053, Adjusted R-squared: 0.01049
## F-statistic: 227.3 on 10 and 213521 DF, p-value: < 2.2e-16
#as.data.frame(sliderRelSpeedBinYear$coefficient)
sliderVertBreakBinYear <- lm(VertBreakDiffYear ~ pitch_group, data = slidersYear)
summary(sliderVertBreakBinYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_group, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.285 -2.178 -0.038 2.083 56.125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0410648 0.0167616 -2.450 0.014289 *
## pitch_group10-19 Pitches 0.0495680 0.0245674 2.018 0.043631 *
## pitch_group20-29 Pitches 0.0269975 0.0273479 0.987 0.323553
## pitch_group30-39 Pitches 0.1084424 0.0305254 3.553 0.000382 ***
## pitch_group40-49 Pitches 0.0484361 0.0340953 1.421 0.155431
## pitch_group50-59 Pitches 0.0322589 0.0375425 0.859 0.390197
## pitch_group60-69 Pitches 0.0391902 0.0415190 0.944 0.345216
## pitch_group70-79 Pitches 0.1816247 0.0466087 3.897 9.75e-05 ***
## pitch_group80-89 Pitches 0.0008113 0.0537635 0.015 0.987961
## pitch_group90-99 Pitches -0.0380980 0.0709716 -0.537 0.591402
## pitch_groupMore Than 100 Pitches 0.0889536 0.1002269 0.888 0.374798
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.86 on 212698 degrees of freedom
## (1328 observations deleted due to missingness)
## Multiple R-squared: 0.000124, Adjusted R-squared: 7.699e-05
## F-statistic: 2.638 on 10 and 212698 DF, p-value: 0.003266
#as.data.frame(sliderVertBreakBinYear$coefficient)
sliderHorzBreakBinYear <- lm(HorzBreakDiffYear ~ pitch_group, data = slidersYear)
summary(sliderHorzBreakBinYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_group, data = slidersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.437 -9.242 -0.921 9.320 38.222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.05040 0.04844 1.041 0.29804
## pitch_group10-19 Pitches -0.19822 0.07104 -2.790 0.00526 **
## pitch_group20-29 Pitches -0.01725 0.07911 -0.218 0.82739
## pitch_group30-39 Pitches 0.02523 0.08832 0.286 0.77512
## pitch_group40-49 Pitches -0.11963 0.09860 -1.213 0.22499
## pitch_group50-59 Pitches 0.02370 0.10860 0.218 0.82723
## pitch_group60-69 Pitches -0.07735 0.12032 -0.643 0.52031
## pitch_group70-79 Pitches 0.04665 0.13505 0.345 0.72979
## pitch_group80-89 Pitches 0.08510 0.15584 0.546 0.58502
## pitch_group90-99 Pitches 0.09221 0.20635 0.447 0.65498
## pitch_groupMore Than 100 Pitches -0.14450 0.29100 -0.497 0.61950
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.13 on 211257 degrees of freedom
## (2769 observations deleted due to missingness)
## Multiple R-squared: 6.55e-05, Adjusted R-squared: 1.817e-05
## F-statistic: 1.384 on 10 and 211257 DF, p-value: 0.1805
#as.data.frame(sliderHorzBreakBinYear$coefficient)
Sinkers
sinkersSpinRateBinYear <- lm(SpinRateDiffYear ~ pitch_group, data = sinkersYear)
summary(sinkersSpinRateBinYear)
##
## Call:
## lm(formula = SpinRateDiffYear ~ pitch_group, data = sinkersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -989.87 -52.80 1.22 56.76 710.06
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.219078 1.407269 0.866 0.38635
## pitch_group10-19 Pitches 4.794218 2.140123 2.240 0.02509 *
## pitch_group20-29 Pitches -0.360638 2.326097 -0.155 0.87679
## pitch_group30-39 Pitches -1.471929 2.671352 -0.551 0.58164
## pitch_group40-49 Pitches -4.527639 3.007504 -1.505 0.13223
## pitch_group50-59 Pitches -6.829206 3.340819 -2.044 0.04095 *
## pitch_group60-69 Pitches -11.422726 3.607824 -3.166 0.00155 **
## pitch_group70-79 Pitches -12.984389 4.004572 -3.242 0.00119 **
## pitch_group80-89 Pitches -0.003152 4.488724 -0.001 0.99944
## pitch_group90-99 Pitches -12.918962 5.539613 -2.332 0.01971 *
## pitch_groupMore Than 100 Pitches 2.152943 8.189170 0.263 0.79263
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 97.48 on 17972 degrees of freedom
## (62 observations deleted due to missingness)
## Multiple R-squared: 0.002452, Adjusted R-squared: 0.001897
## F-statistic: 4.417 on 10 and 17972 DF, p-value: 3.124e-06
sinkersRelSpeedBinYear <- lm(RelSpeedDiffYear ~ pitch_group, data = sinkersYear)
summary(sinkersRelSpeedBinYear)
##
## Call:
## lm(formula = RelSpeedDiffYear ~ pitch_group, data = sinkersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.9628 -0.6982 0.0325 0.7699 7.7123
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.34539 0.01891 18.266 < 2e-16 ***
## pitch_group10-19 Pitches -0.21119 0.02874 -7.349 2.08e-13 ***
## pitch_group20-29 Pitches -0.28972 0.03125 -9.271 < 2e-16 ***
## pitch_group30-39 Pitches -0.48009 0.03589 -13.376 < 2e-16 ***
## pitch_group40-49 Pitches -0.58259 0.04039 -14.423 < 2e-16 ***
## pitch_group50-59 Pitches -0.72496 0.04490 -16.147 < 2e-16 ***
## pitch_group60-69 Pitches -0.83242 0.04846 -17.177 < 2e-16 ***
## pitch_group70-79 Pitches -0.91636 0.05375 -17.049 < 2e-16 ***
## pitch_group80-89 Pitches -0.85191 0.06027 -14.134 < 2e-16 ***
## pitch_group90-99 Pitches -0.93975 0.07445 -12.623 < 2e-16 ***
## pitch_groupMore Than 100 Pitches -0.85333 0.11006 -7.754 9.41e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.31 on 17993 degrees of freedom
## (41 observations deleted due to missingness)
## Multiple R-squared: 0.04985, Adjusted R-squared: 0.04932
## F-statistic: 94.4 on 10 and 17993 DF, p-value: < 2.2e-16
sinkersVertBreakBinYear <- lm(VertBreakDiffYear ~ pitch_group, data = sinkersYear)
summary(sinkersVertBreakBinYear)
##
## Call:
## lm(formula = VertBreakDiffYear ~ pitch_group, data = sinkersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.5546 -1.6612 0.0173 1.6575 17.4043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.032401 0.040722 -0.796 0.4262
## pitch_group10-19 Pitches 0.087538 0.061873 1.415 0.1571
## pitch_group20-29 Pitches 0.058048 0.067317 0.862 0.3885
## pitch_group30-39 Pitches -0.041624 0.077289 -0.539 0.5902
## pitch_group40-49 Pitches -0.012714 0.086989 -0.146 0.8838
## pitch_group50-59 Pitches -0.008553 0.096734 -0.088 0.9295
## pitch_group60-69 Pitches 0.070180 0.104330 0.673 0.5012
## pitch_group70-79 Pitches -0.055414 0.115709 -0.479 0.6320
## pitch_group80-89 Pitches 0.094518 0.129866 0.728 0.4667
## pitch_group90-99 Pitches 0.366821 0.160268 2.289 0.0221 *
## pitch_groupMore Than 100 Pitches 0.070593 0.236922 0.298 0.7657
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.82 on 17980 degrees of freedom
## (54 observations deleted due to missingness)
## Multiple R-squared: 0.0005471, Adjusted R-squared: -8.771e-06
## F-statistic: 0.9842 on 10 and 17980 DF, p-value: 0.4545
sinkersHorzBreakBinYear <- lm(HorzBreakDiffYear ~ pitch_group, data = sinkersYear)
summary(sinkersHorzBreakBinYear)
##
## Call:
## lm(formula = HorzBreakDiffYear ~ pitch_group, data = sinkersYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.581 -8.800 -0.734 8.828 41.458
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.07817 0.15745 0.496 0.6196
## pitch_group10-19 Pitches -0.14002 0.23928 -0.585 0.5584
## pitch_group20-29 Pitches -0.13743 0.26034 -0.528 0.5976
## pitch_group30-39 Pitches 0.01033 0.29893 0.035 0.9724
## pitch_group40-49 Pitches -0.25693 0.33676 -0.763 0.4455
## pitch_group50-59 Pitches 0.23568 0.37500 0.628 0.5297
## pitch_group60-69 Pitches -0.54405 0.40475 -1.344 0.1789
## pitch_group70-79 Pitches -0.50730 0.45096 -1.125 0.2606
## pitch_group80-89 Pitches -0.36109 0.50530 -0.715 0.4749
## pitch_group90-99 Pitches 0.85636 0.62559 1.369 0.1711
## pitch_groupMore Than 100 Pitches 1.95723 0.91587 2.137 0.0326 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.86 on 17800 degrees of freedom
## (234 observations deleted due to missingness)
## Multiple R-squared: 0.0006843, Adjusted R-squared: 0.0001229
## F-statistic: 1.219 on 10 and 17800 DF, p-value: 0.2727
##Visualizing Regression Coefficients
library(jtools)
plot_summs(fastballSpinRateBinYear, changeupSpinRateBinYear, curveballSpinRateBinYear, cutterSpinRateBinYear, sliderSpinRateBinYear, model.names = c('Fastball', 'ChangeUp', 'Curveball', 'Cutter', 'Slider'), omit.coefs = c('(Intercept)', 'pitchBin9.+. 100+'))
## Registered S3 methods overwritten by 'broom':
## method from
## tidy.glht jtools
## tidy.summary.glht jtools
## Loading required namespace: broom.mixed
Release Speed Summary Graph
plot_summs(fastballRelSpeedBinYear, changeupRelSpeedBinYear, curveballRelSpeedBinYear, cutterRelSpeedBinYear, sliderRelSpeedBinYear, model.names = c('Fastball', 'ChangeUp', 'Curveball', 'Cutter', 'Slider'), omit.coefs = c('(Intercept)', 'pitchBin9.+. 100+'))
VErt Break
plot_summs(fastballVertBreakBinYear, changeupVertBreakBinYear, curveballVertBreakBinYear, cutterVertBreakBinYear, sliderVertBreakBinYear, model.names = c('Fastball', 'ChangeUp', 'Curveball', 'Cutter', 'Slider'), omit.coefs = c('(Intercept)', 'pitchBin9.+. 100+'))
Horz Break
plot_summs(fastballHorzBreakBinYear, changeupHorzBreakBinYear, curveballHorzBreakBinYear, cutterHorzBreakBinYear, sliderHorzBreakBinYear, model.names = c('Fastball', 'ChangeUp', 'Curveball', 'Cutter', 'Slider'), omit.coefs = c('(Intercept)', 'pitchBin9.+. 100+'))
data1ac <- sqldf("select
CASE WHEN PitchCall = 'InPlay' THEN TaggedHitType
WHEN PitchCall = 'BallCalled' AND KorBB = 'Walk' THEN KorBB
WHEN PitchCall = 'StrikeSwinging' AND KorBB = 'Strikeout' THEN 'SwingingStikeout'
WHEN PitchCall = 'StrikeCalled' AND KorBB = 'Strikeout' THEN 'StrikeoutLooking'
ELSE PitchCall END AS Result
FROM pitches" )
pitches['Result'] = data1ac['Result']
THe . in the column name is causing problems. Let’s fix that
pitches$TopOrBottom <- pitches$Top.Bottom
I only want one row for each plate appearance. THe last one that shows how it ended.(With a hit, walk, home run, strikeout etc.) I can’t think of many ways to do this other than another fat for loop. If its ordered by date I’ll loop through it and compare each batterid to the next batter id. if they are different then I add that row to my new dataset
intermediate_step <- sqldf("SELECT PitcherId, BatterTeam, GameID, PitcherTeam, Date, Inning, TopOrBottom, PAofInning, BatterId, Result, PlayResult, KorBB, pitch_count - PitchofPA + 1 AS CountatFirstPitch, PitchofPA from pitches ORDER BY GameID, BatterTeam, Inning, PAofInning ")
length(intermediate_step$BatterId)
## [1] 1144455
#Well run into problems with the occasional batter without a BAtter ID (NA)
#Therefore, I'm going to replace the NA batterIds it 0000 or something
#First lets see how many there are
mean(is.na(intermediate_step$BatterId))
## [1] 0.0002621335
#looks like .026 percent. Thats not bad. It isnt a problem unless theres 2 consecuive batters with no id
#I'm going to make NA BatterIds equal 999999
intermediate_step$BatterId <- intermediate_step$BatterId %>% replace(is.na(.), 999999)
last_pitches = intermediate_step %>%
group_by(GameID, Inning, BatterId, PAofInning) %>%
slice_tail(n=1) %>%
arrange(GameID, Inning, PitcherTeam, PAofInning)
dim(last_pitches)
## [1] 300503 14
#Neew code chucnk just to take a look at last_pitches
head(last_pitches, 300)
PitcherId <dbl> | BatterTeam <chr> | GameID <chr> | PitcherTeam <chr> | Date <chr> | Inning <int> | TopOrBottom <chr> | |
|---|---|---|---|---|---|---|---|
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 1000017008 | LON_DIR | 20190223-BlairField-1 | NEV_WOL | 2019-02-23 | 1 | Bottom | |
| 1000017008 | LON_DIR | 20190223-BlairField-1 | NEV_WOL | 2019-02-23 | 1 | Bottom | |
| 1000017008 | LON_DIR | 20190223-BlairField-1 | NEV_WOL | 2019-02-23 | 1 | Bottom |
# Add a new factor column to the dataframe for the pitch group
last_pitches$first_pitch_bin <- as.factor(ifelse(last_pitches$CountatFirstPitch < 100, (last_pitches$CountatFirstPitch) %/% 10 + 1, 11))
# Check the updated dataframe
head(last_pitches, 250)
PitcherId <dbl> | BatterTeam <chr> | GameID <chr> | PitcherTeam <chr> | Date <chr> | Inning <int> | TopOrBottom <chr> | |
|---|---|---|---|---|---|---|---|
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 8900656 | NEV_WOL | 20190223-BlairField-1 | LON_DIR | 2019-02-23 | 1 | Top | |
| 1000017008 | LON_DIR | 20190223-BlairField-1 | NEV_WOL | 2019-02-23 | 1 | Bottom | |
| 1000017008 | LON_DIR | 20190223-BlairField-1 | NEV_WOL | 2019-02-23 | 1 | Bottom | |
| 1000017008 | LON_DIR | 20190223-BlairField-1 | NEV_WOL | 2019-02-23 | 1 | Bottom |
#Let make a first_Pitch_group Column
last_pitches$first_pitch_group <- NA
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '1'] <- '0-9 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '2'] <- '10-19 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '3'] <- '20-29 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '4'] <- '30-39 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '5'] <- '40-49 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '6'] <- '50-59 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '7'] <- '60-69 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '8'] <- '70-79 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '9'] <- '80-89 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '10'] <- '90-99 pitches'
last_pitches$first_pitch_group[last_pitches$first_pitch_bin == '11'] <- 'More Than 100 Pitches'
#MAke sure the order is correct. Really annoying if regression output isn't in ascending order
sqldf("SELECT first_pitch_group, count(*) from last_pitches GROUP BY first_pitch_group ORDER BY first_pitch_group")
first_pitch_group <chr> | count(*) <int> | |||
|---|---|---|---|---|
| NA | 823 | |||
| 0-9 pitches | 89373 | |||
| 10-19 pitches | 58844 | |||
| 20-29 pitches | 41731 | |||
| 30-39 pitches | 30461 | |||
| 40-49 pitches | 23003 | |||
| 50-59 pitches | 17982 | |||
| 60-69 pitches | 14511 | |||
| 70-79 pitches | 10884 | |||
| 80-89 pitches | 7399 |
Last Pitches is looking pretty good. I need a way to quanitfy the result of the play. Probably will do a walk column, hit column, total bases column
last_pitches$result2 <- NA
last_pitches$result3 <- NA
last_pitches$result2 <- ifelse(!(is.na(last_pitches$PlayResult)) & last_pitches$PlayResult != 'Undefined', last_pitches$PlayResult, NA)
last_pitches$result2 <- ifelse(!(is.na(last_pitches$KorBB)) & last_pitches$KorBB != 'Undefined', last_pitches$Result, last_pitches$result2)
last_pitches$result2 <- ifelse(!(is.na(last_pitches$Result)) & last_pitches$Result == 'HitByPitch', last_pitches$Result, last_pitches$result2)
last_pitches$result2 <- ifelse(is.na(last_pitches$result2), "###ISSUE###", last_pitches$result2)
last_pitches$result3 <- ifelse(is.na(last_pitches$result2), last_pitches$Result, last_pitches$result3)
sqldf("SELECT result2, count(*) FROM last_pitches GROUP BY result2")
result2 <chr> | count(*) <int> | |||
|---|---|---|---|---|
| 112 | ||||
| ###ISSUE### | 1452 | |||
| BallIntentional | 685 | |||
| BallinDirt | 415 | |||
| Double | 13099 | |||
| Error | 4836 | |||
| FieldersChoice | 3735 | |||
| Fielderschoice | 13 | |||
| FlyBall | 2 | |||
| GroundBall | 1 |
Fix the nonsense
newdata <- last_pitches[(last_pitches$result2 %in% c('Double','Error','FieldersChoice', 'Fielderschoice', 'HitByPitch', 'HomeRun', 'Homerun', 'Out', 'Sacrifice', 'Single', 'StrikeoutLooking', 'SwingingStikeout', 'Triple', 'Walk')), ]
newdata1 <- newdata
newdata1$result2[newdata$result2 == 'Fielderschoice'] <- 'FieldersChoice'
newdata1$result2[newdata$result2 == 'Homerun'] <- 'HomeRun'
if it does work let me write to csv real quick
write.csv(newdata1, 'C:\\Users\\Nick\\UCSB Baseball\\NickAllCollegeTrackmanAtBatsNew.csv')
atBats <- read.csv('C:\\Users\\Nick\\UCSB Baseball\\NickAllCollegeTrackmanAtBatsNew.csv')
Now that we have atBats dataframe ready, lets run some regresssions on results based metrics WHIP, FIP
#WHIP and FIP regressions
##WHIP
#Making some dummy columns
library(fastDummies)
lastPitches <- dummy_cols(atBats, select_columns = 'result2')
##WHIP regression
lastPitches$walksPlusHits <- lastPitches$result2_Walk + lastPitches$result2_Single + lastPitches$result2_Double + lastPitches$result2_Triple + lastPitches$result2_HomeRun + lastPitches$result2_HitByPitch
lastPitches$outs <- lastPitches$result2_StrikeoutLooking + lastPitches$result2_SwingingStikeout + lastPitches$result2_FieldersChoice + lastPitches$result2_Out + lastPitches$result2_Sacrifice
Ok lets try to group by firstPitchGroup and combine these dummy columns in a way that calculates opp average, whip, etc
whip_bin_avgs <- sqldf("SELECT first_pitch_group, count(*),
cast((sum(cast(walksPlusHits as float)) / sum(cast(outs as float)) * 3.00) as float) AS whip FROM lastPitches group by first_pitch_group")
whip_bin_avgs
first_pitch_group <chr> | count(*) <int> | whip <dbl> | ||
|---|---|---|---|---|
| NA | 809 | 2.455782 | ||
| 0-9 pitches | 88714 | 1.775129 | ||
| 10-19 pitches | 58247 | 1.730861 | ||
| 20-29 pitches | 41325 | 1.663266 | ||
| 30-39 pitches | 30163 | 1.752321 | ||
| 40-49 pitches | 22821 | 1.726143 | ||
| 50-59 pitches | 17811 | 1.666874 | ||
| 60-69 pitches | 14384 | 1.693023 | ||
| 70-79 pitches | 10794 | 1.669259 | ||
| 80-89 pitches | 7328 | 1.573573 |
Probably should make another SQL querly like the one bove bt gets a pitchers individual whip so we can see if its higher when the pitch count is higher. Either way I’d need yet another query that groups At Bats in the lastPitches df by both picher and Pitch Count Bin
whip_bin_pitcher_avgs <- sqldf("SELECT first_pitch_group, PitcherId, count(*) as numAtBats,
cast((sum(cast(walksPlusHits as float)) / sum(cast(outs as float)) * 3.00) as float) AS whip_bin FROM lastPitches group by first_pitch_group, PitcherId ORDER BY count(*) DESC")
whip_bin_pitcher_avgs_ten_plus <- whip_bin_pitcher_avgs[whip_bin_pitcher_avgs$numAtBats >= 4, ]
whip_pitcher_avgs <- sqldf("SELECT PitcherId, count(*),
cast((sum(cast(walksPlusHits as float)) / sum(cast(outs as float)) * 3.00) as float) AS whip_avg FROM lastPitches group by PitcherId ORDER BY count(*) DESC")
whip_pitcher_avgs
PitcherId <dbl> | count(*) <int> | whip_avg <dbl> | ||
|---|---|---|---|---|
| 1000049176 | 490 | 1.4860681 | ||
| 1000036922 | 395 | 1.5944882 | ||
| 1000026610 | 391 | 1.9173913 | ||
| 1000085193 | 386 | 1.0249110 | ||
| 1000080401 | 373 | 1.7816594 | ||
| 1000049078 | 367 | 1.8616071 | ||
| 1000010353 | 367 | 1.6282051 | ||
| 1000058539 | 349 | 1.6986301 | ||
| 1000029326 | 347 | 1.0546875 | ||
| 1000051677 | 340 | 1.3318966 |
whipdf <- left_join(whip_bin_pitcher_avgs_ten_plus, whip_pitcher_avgs, on = 'PitcherId')
## Joining, by = "PitcherId"
whipdf$diff <- whipdf$whip_bin - whipdf$whip_avg
Linear regression whip diff against pitch bin
whip_diff_mod <- lm(diff ~ first_pitch_group, data = whipdf, weights = numAtBats)
summary(whip_diff_mod)
##
## Call:
## lm(formula = diff ~ first_pitch_group, data = whipdf, weights = numAtBats)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -51.320 -2.796 -0.903 1.664 84.547
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.14448 0.01912 7.555 4.35e-14 ***
## first_pitch_group10-19 pitches 0.04997 0.03073 1.626 0.104020
## first_pitch_group20-29 pitches 0.05781 0.03480 1.661 0.096650 .
## first_pitch_group30-39 pitches 0.24717 0.03944 6.267 3.75e-10 ***
## first_pitch_group40-49 pitches 0.22669 0.04426 5.122 3.05e-07 ***
## first_pitch_group50-59 pitches 0.15657 0.04929 3.177 0.001493 **
## first_pitch_group60-69 pitches 0.25429 0.05391 4.716 2.42e-06 ***
## first_pitch_group70-79 pitches 0.24326 0.06176 3.939 8.22e-05 ***
## first_pitch_group80-89 pitches 0.20015 0.07473 2.678 0.007402 **
## first_pitch_group90-99 pitches 0.37981 0.10675 3.558 0.000375 ***
## first_pitch_groupMore Than 100 Pitches 0.29814 0.17317 1.722 0.085148 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.524 on 21553 degrees of freedom
## (165 observations deleted due to missingness)
## Multiple R-squared: 0.004121, Adjusted R-squared: 0.003659
## F-statistic: 8.919 on 10 and 21553 DF, p-value: 8.359e-15
whip_diff_mod
##
## Call:
## lm(formula = diff ~ first_pitch_group, data = whipdf, weights = numAtBats)
##
## Coefficients:
## (Intercept) first_pitch_group10-19 pitches
## 0.14448 0.04997
## first_pitch_group20-29 pitches first_pitch_group30-39 pitches
## 0.05781 0.24717
## first_pitch_group40-49 pitches first_pitch_group50-59 pitches
## 0.22669 0.15657
## first_pitch_group60-69 pitches first_pitch_group70-79 pitches
## 0.25429 0.24326
## first_pitch_group80-89 pitches first_pitch_group90-99 pitches
## 0.20015 0.37981
## first_pitch_groupMore Than 100 Pitches
## 0.29814
##Fip regression Fip is just walks, homeruns and strikeouts. Let me google the formuler real quick
lastPitches$BBFIP <- (lastPitches$result2_Walk + lastPitches$result2_HitByPitch) * 3.00
lastPitches$HRFIP <- lastPitches$result2_HomeRun * 13.00
lastPitches$KFIP <- lastPitches$result2_Strikeout * (-2.00)
lastPitches$NUMERATOR <- (lastPitches$BBFIP + lastPitches$HRFIP + lastPitches$KFIP)*3
Ok lets try to group by firstPitchGroup and combine these dummy columns in a way that calculates opp average, whip, etc FIP constant 3.9 is whats being used
fip_bin_avgs <- sqldf("SELECT first_pitch_group, count(*),
sum(NUMERATOR)/sum(outs) + 3.90 AS fip FROM lastPitches group by first_pitch_group")
fip_bin_avgs
first_pitch_group <chr> | count(*) <int> | fip <dbl> | ||
|---|---|---|---|---|
| NA | 809 | 9.770748 | ||
| 0-9 pitches | 88714 | 6.814819 | ||
| 10-19 pitches | 58247 | 6.761921 | ||
| 20-29 pitches | 41325 | 6.558911 | ||
| 30-39 pitches | 30163 | 6.746302 | ||
| 40-49 pitches | 22821 | 6.816579 | ||
| 50-59 pitches | 17811 | 6.583818 | ||
| 60-69 pitches | 14384 | 6.583056 | ||
| 70-79 pitches | 10794 | 6.623991 | ||
| 80-89 pitches | 7328 | 6.584778 |
Probably should make another SQL querly like the one bove bt gets a pitchers individual whip so we can see if its higher when the pitch count is higher. Maybe ill get that pitches whip with, say, 0-20 pitches and see if theres an increase when count is higher. Either way I’d need yet another query that groups At Bats in the lastPitches df by both picher and Pitch Count Bin
fip_bin_pitcher_avgs <- sqldf("SELECT first_pitch_group, PitcherId, count(*) as numAtBats,
sum(NUMERATOR)/sum(outs) + 3.90 AS fip_bin FROM lastPitches group by first_pitch_group, PitcherId ORDER BY count(*) DESC")
fip_bin_pitcher_avgs_ten_plus <- fip_bin_pitcher_avgs[fip_bin_pitcher_avgs$numAtBats >= 4, ]
fip_pitcher_avgs <- sqldf("SELECT PitcherId, count(*),
sum(NUMERATOR)/sum(outs) + 3.90 AS fip_avg FROM lastPitches group by PitcherId ORDER BY count(*) DESC")
fip_pitcher_avgs
PitcherId <dbl> | count(*) <int> | fip_avg <dbl> | ||
|---|---|---|---|---|
| 1000049176 | 490 | 5.841176 | ||
| 1000036922 | 395 | 6.073228 | ||
| 1000026610 | 391 | 6.039130 | ||
| 1000085193 | 386 | 5.746975 | ||
| 1000080401 | 373 | 5.498253 | ||
| 1000049078 | 367 | 7.315179 | ||
| 1000010353 | 367 | 7.810256 | ||
| 1000058539 | 349 | 6.571233 | ||
| 1000029326 | 347 | 5.575781 | ||
| 1000051677 | 340 | 6.848276 |
fipdf <- left_join(fip_bin_pitcher_avgs_ten_plus, fip_pitcher_avgs, on = 'PitcherId')
## Joining, by = "PitcherId"
fipdf$diff_fip <- fipdf$fip_bin - fipdf$fip_avg
Linear regression whip diff against pitch bin
fip_diff_mod <- lm(diff_fip ~ first_pitch_group, data = fipdf, weights = numAtBats)
summary(fip_diff_mod)
##
## Call:
## lm(formula = diff_fip ~ first_pitch_group, data = fipdf, weights = numAtBats)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -221.240 -7.973 -2.804 4.303 313.085
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.22021 0.05877 3.747 0.000180 ***
## first_pitch_group10-19 pitches 0.16630 0.09446 1.761 0.078326 .
## first_pitch_group20-29 pitches 0.05993 0.10694 0.560 0.575229
## first_pitch_group30-39 pitches 0.63672 0.12121 5.253 1.51e-07 ***
## first_pitch_group40-49 pitches 0.57933 0.13603 4.259 2.06e-05 ***
## first_pitch_group50-59 pitches 0.34574 0.15148 2.282 0.022481 *
## first_pitch_group60-69 pitches 0.48029 0.16570 2.898 0.003754 **
## first_pitch_group70-79 pitches 0.67495 0.18982 3.556 0.000378 ***
## first_pitch_group80-89 pitches 0.67152 0.22966 2.924 0.003460 **
## first_pitch_group90-99 pitches 0.89788 0.32810 2.737 0.006213 **
## first_pitch_groupMore Than 100 Pitches 1.00923 0.53223 1.896 0.057943 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.98 on 21553 degrees of freedom
## (165 observations deleted due to missingness)
## Multiple R-squared: 0.002878, Adjusted R-squared: 0.002416
## F-statistic: 6.222 on 10 and 21553 DF, p-value: 1.424e-09
as.data.frame(fip_diff_mod$coefficients)
fip_diff_mod$coefficients <dbl> | ||||
|---|---|---|---|---|
| (Intercept) | 0.22021394 | |||
| first_pitch_group10-19 pitches | 0.16630244 | |||
| first_pitch_group20-29 pitches | 0.05992669 | |||
| first_pitch_group30-39 pitches | 0.63671842 | |||
| first_pitch_group40-49 pitches | 0.57932858 | |||
| first_pitch_group50-59 pitches | 0.34573559 | |||
| first_pitch_group60-69 pitches | 0.48028900 | |||
| first_pitch_group70-79 pitches | 0.67495153 | |||
| first_pitch_group80-89 pitches | 0.67152095 | |||
| first_pitch_group90-99 pitches | 0.89788063 |
#BABIP (Batting Average on BAlls In Play)
lastPitches$BIP <- lastPitches$result2_Triple + lastPitches$result2_Single +lastPitches$result2_Double + lastPitches$result2_HomeRun + lastPitches$result2_FieldersChoice + lastPitches$result2_Out + lastPitches$result2_Sacrifice
lastPitches$hits <- lastPitches$result2_Triple + lastPitches$result2_Single +lastPitches$result2_Double + lastPitches$result2_HomeRun
Get each pitchers BABIP for Abs starting in each bin
babip_bin_pitcher_avgs <- sqldf("SELECT first_pitch_group, PitcherId, count(*) as numAtBats,
sum(cast(hits as float))/sum(cast(BIP as float)) AS babip_bin FROM lastPitches group by first_pitch_group, PitcherId ORDER BY count(*) DESC")
babip_bin_pitcher_avgs_ten_plus <- babip_bin_pitcher_avgs[babip_bin_pitcher_avgs$numAtBats >= 4, ]
Get the avg for each pitcher for all their batters faced
babip_pitcher_avgs <- sqldf("SELECT PitcherId, count(*),
sum(cast(hits as float))/sum(cast(BIP as float)) AS babip_avg FROM lastPitches group by PitcherId ORDER BY count(*) DESC")
babip_pitcher_avgs
PitcherId <dbl> | count(*) <int> | babip_avg <dbl> | ||
|---|---|---|---|---|
| 1000049176 | 490 | 0.35474006 | ||
| 1000036922 | 395 | 0.33884298 | ||
| 1000026610 | 391 | 0.37676056 | ||
| 1000085193 | 386 | 0.35121951 | ||
| 1000080401 | 373 | 0.38783270 | ||
| 1000049078 | 367 | 0.39639640 | ||
| 1000010353 | 367 | 0.33613445 | ||
| 1000058539 | 349 | 0.36575875 | ||
| 1000029326 | 347 | 0.33170732 | ||
| 1000051677 | 340 | 0.33160622 |
babipdf <- left_join(babip_bin_pitcher_avgs_ten_plus, babip_pitcher_avgs, on = 'PitcherId')
## Joining, by = "PitcherId"
babipdf$diff_babip <- babipdf$babip_bin - babipdf$babip_avg
Linear regression whip diff against pitch bin
babip_diff_mod <- lm(diff_babip ~ first_pitch_group, data = babipdf, weights = numAtBats)
summary(babip_diff_mod)
##
## Call:
## lm(formula = diff_babip ~ first_pitch_group, data = babipdf,
## weights = numAtBats)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.08710 -0.38630 0.00311 0.34908 2.52238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0029080 0.0019359 -1.502 0.1331
## first_pitch_group10-19 pitches 0.0017310 0.0031112 0.556 0.5780
## first_pitch_group20-29 pitches -0.0010394 0.0035231 -0.295 0.7680
## first_pitch_group30-39 pitches 0.0056509 0.0039906 1.416 0.1568
## first_pitch_group40-49 pitches 0.0067702 0.0044781 1.512 0.1306
## first_pitch_group50-59 pitches -0.0007593 0.0049852 -0.152 0.8789
## first_pitch_group60-69 pitches 0.0089646 0.0054537 1.644 0.1002
## first_pitch_group70-79 pitches 0.0144982 0.0062474 2.321 0.0203 *
## first_pitch_group80-89 pitches 0.0067045 0.0075672 0.886 0.3756
## first_pitch_group90-99 pitches 0.0081082 0.0108421 0.748 0.4546
## first_pitch_groupMore Than 100 Pitches 0.0113297 0.0175016 0.647 0.5174
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5593 on 21607 degrees of freedom
## (111 observations deleted due to missingness)
## Multiple R-squared: 0.0005639, Adjusted R-squared: 0.0001014
## F-statistic: 1.219 on 10 and 21607 DF, p-value: 0.2725
as.data.frame(babip_diff_mod$coefficients)
babip_diff_mod$coefficients <dbl> | ||||
|---|---|---|---|---|
| (Intercept) | -0.0029080077 | |||
| first_pitch_group10-19 pitches | 0.0017310108 | |||
| first_pitch_group20-29 pitches | -0.0010393972 | |||
| first_pitch_group30-39 pitches | 0.0056509158 | |||
| first_pitch_group40-49 pitches | 0.0067702383 | |||
| first_pitch_group50-59 pitches | -0.0007593253 | |||
| first_pitch_group60-69 pitches | 0.0089646357 | |||
| first_pitch_group70-79 pitches | 0.0144981573 | |||
| first_pitch_group80-89 pitches | 0.0067045477 | |||
| first_pitch_group90-99 pitches | 0.0081081554 |